Do ‘leaders’ sound different from ’laggers? Exploring the perceptual similarity of New Zealand English voices: Supplementary Materials

Authors
Affiliations
Elena Sheard

New Zealand Institute of Language, Brain and Behaviour, University of Canterbury

Jen Hay

New Zealand Institute of Language, Brain and Behaviour, University of Canterbury

Department of Linguistics, University of Canterbury

Joshua Wilson Black

New Zealand Institute of Language, Brain and Behaviour, University of Canterbury

Lynn Clark

New Zealand Institute of Language, Brain and Behaviour, University of Canterbury

Department of Linguistics, University of Canterbury

Published

March 12, 2025

1 Overview

This is the current version of the supplementary materials for the manuscript “Do ‘leaders’ sound different from ‘laggers’? Exploring the perceptual similarity of New Zealand English voices”. The manuscript presents the results from an online pairwise similarity rating task, in which New Zealanders rated the perceived similarity of pairs of New Zealand English speakers. The pre-registration for this analysis can be accessed here.

The materials have the following structure:

  • Section 2 loads the R packages and data frames used in the analysis

  • Section 3 presents transcripts of the audio stimuli used in the pairwise similarity rating task, and a stock take of the monophthongs present in each stimulus.

  • Section 4 describes how the stimuli pair subsets were distributed across participants in the pairwise rating task.

  • Section 5 contains the code for the results and figures reported in the manuscript, specifically:

  • Section 6 presents the pre-registered correlations between the MDS dimensions and independent variables.

  • Finally, Section 8 discusses our data filtering decisions and presents the same analyses applied in Section 5 to the experiment data filtered according to the pre-registration.

If you would like to see how we scaled participant similarity ratings and created the similarity matrices used in this file, please refer to the Generate-data-frames.qmd script in the github repository or at this link.

2 Libraries and dataframes

The following code chunks load:

  1. Required libraries.
Click here to view code.
# Data manipulation and visualisation
library(tidyverse)

# Set theme for visualisations
theme_set(theme_bw())

# Visualisations
library(ggcorrplot) # correlation plots
library(corrplot) # for the cor.mtest() function.
library(cowplot) # function plot_grid
library(rpart.plot) # visualise regression tree output
library(gratia) # visualise bam model
library(magick) # manipulate .png files
library(ggalt) # gg_encircle function

# Statistical analyses
# MDS
library(smacof) # Apply MDS
library(rsample) # bootstrapping
library(vegan) # procrustes transformation of MDS
# Using development version of nzilbb.vowels,
# Any version after 0.3.1 will work.
# To install the development version, run the following line.
# You may also need to install the 'remotes' package.
# remotes::install_github('nzilbb/nzilbb_vowels')
library(nzilbb.vowels) # Assess MDS fit

# Decision trees
library(tidymodels) # fit decision trees
library(parsnip) # fit decision trees and random forests
library(randomForest) # run random forests
library(rpart) # run regression trees
library(vip) # assess importance of predictors in random forests

# GAMMs
library(mgcv) # fit bam models

# Other
library(here) # localised file paths
library(knitr)
library(kableExtra) # html tables when rendering
library(grateful) # write package citations at end of document
  1. Similarity matrices (for both reported and pre-registered filtering)

  2. Individual pairwise similarity ratings (for both reported and pre-registered filtering)

  3. Converts similarity matrices to dissimilarity matrices for MDS.

Click here to view code.
# Load scaled similarity matrix from experimental results.
# Generated in Generate-data-frames.qmd
# 7 participants removed based on reported filtering
# Individual pairwise ratings scaled per participant
# Then mean taken for each stimuli pair
matrix_scaled_ID <- read_rds(
  here(
    "Data",
  "PW_matrix_scaled_anon_250124.rds"
  )
)

# Load scaled matrix based on pre-registered filtered data
# 4 participants removed
# Individual pairwise ratings scaled per participant
# Then mean taken for each stimuli pair
matrix_scaled_prereg <- read_rds(
  here(
    "Data",
  "PW_matrix_scaled_PR_anon_250124.rds"
  )
)

# Load individual scaled and unscaled similarity ratings based on reported
# filtering Also generated in 5_matrix generation
PW_ratings_ID <- read_rds(
  here(
    "Data",
  "PW_ratings_scaled_filtered_anon_250124.rds"
  )
)

# Load individual pairwise similarity ratings based on pre-registered filtering
PW_ratings_prereg <- read_rds(
  here(
    "Data",
   "PW_ratings_scaled_PR_anon_250124.rds"
  )
)

# Load unfiltered pairwise similarity ratings Filtered in label refers to
# filtering for eligible  participants under participation criteria
PW_ratings_unfiltered <-
  read_rds(
    here(
      "Data",
    "PW_ratings_scaled_unfiltered_anon_250124.rds"
    )
  )

# Convert similarity matrices to dissimilarity matrices for MDS.
df_prereg <- sim2diss(matrix_scaled_prereg, method = "reverse")

df_ID <- sim2diss(matrix_scaled_ID, method = "reverse")
  1. Additional data frames used in the analysis, specifically:
  • QB1 PCA scores

  • Measurements of articulation rate and pitch

Click here to view code.
# Load acoustic measures (PC scores, articulation rate, mean pitch)
# Generated in Additional_data_for_analysis.qmd file
measures_df <- read_rds(here(
  "Data",
  "QB1_scores_38_anon_250121.rds"
))

# Swap PC1 (leader-lagger PC score) so that higher scores correspond to being a leader in change
measures_df <- measures_df %>%
  mutate(PC1_swapped = PC1 * (-1))

# Scale variables of interest
measures_df <- measures_df %>%
  mutate(
    LeaderLaggerScore = scale(PC1_swapped),
    BackVowelScore = scale(PC2),
    ArticRate = scale(articulation_rate),
    SpeechRate = scale(speech_rate_update),
    MeanPitch = scale(pitch_cross_75_500)
  )

# Load vowel counts for audio stimuli
VowelCount <- read_rds(here(
  "Data",
  "StimuliVowelFrequency_anon.rds"
))

2.1 Session information and additional functions

Here are the full details of system and packages used to run this analysis:

Click here to view code.
sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_New Zealand.utf8  LC_CTYPE=English_New Zealand.utf8   
[3] LC_MONETARY=English_New Zealand.utf8 LC_NUMERIC=C                        
[5] LC_TIME=English_New Zealand.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] grateful_0.2.10      kableExtra_1.4.0     knitr_1.46          
 [4] here_1.0.1           mgcv_1.9-1           nlme_3.1-164        
 [7] vip_0.4.1            randomForest_4.7-1.1 yardstick_1.3.1     
[10] workflowsets_1.1.0   workflows_1.1.4      tune_1.2.1          
[13] recipes_1.0.10       parsnip_1.2.1        modeldata_1.3.0     
[16] infer_1.0.7          dials_1.2.1          scales_1.3.0        
[19] broom_1.0.5          tidymodels_1.2.0     nzilbb.vowels_0.3.1 
[22] patchwork_1.2.0      vegan_2.6-4          lattice_0.22-6      
[25] permute_0.9-7        rsample_1.2.1        smacof_2.1-6        
[28] e1071_1.7-14         colorspace_2.1-0     plotrix_3.8-4       
[31] ggalt_0.4.0          magick_2.8.5         gratia_0.9.0        
[34] rpart.plot_3.1.2     rpart_4.1.23         cowplot_1.1.3       
[37] corrplot_0.92        ggcorrplot_0.1.4.1   lubridate_1.9.3     
[40] forcats_1.0.0        stringr_1.5.1        dplyr_1.1.4         
[43] purrr_1.0.2          readr_2.1.5          tidyr_1.3.1         
[46] tibble_3.2.1         ggplot2_3.5.0        tidyverse_2.0.0     

loaded via a namespace (and not attached):
  [1] backports_1.4.1     Hmisc_5.1-2         systemfonts_1.0.6  
  [4] splines_4.2.2       listenv_0.9.1       candisc_0.8-6      
  [7] digest_0.6.35       foreach_1.5.2       htmltools_0.5.8.1  
 [10] gdata_3.0.0         fansi_1.0.6         magrittr_2.0.3     
 [13] checkmate_2.3.1     cluster_2.1.6       doParallel_1.0.17  
 [16] tzdb_0.4.0          globals_0.16.3      gower_1.0.1        
 [19] extrafont_0.19      wordcloud_2.6       svglite_2.1.3      
 [22] extrafontdb_1.0     hardhat_1.3.1       timechange_0.3.0   
 [25] ggrepel_0.9.5       pan_1.9             rbibutils_2.2.16   
 [28] xfun_0.43           jsonlite_1.8.8      lme4_1.1-35.3      
 [31] survival_3.5-8      iterators_1.0.14    glue_1.7.0         
 [34] gtable_0.3.4        ipred_0.9-14        nnls_1.5           
 [37] proj4_1.0-14        car_3.1-2           weights_1.0.4      
 [40] Rttf2pt1_1.3.12     future.apply_1.11.2 shape_1.4.6.1      
 [43] maps_3.4.2          jomo_2.7-6          abind_1.4-5        
 [46] Rcpp_1.0.12         viridisLite_0.4.2   htmlTable_2.4.2    
 [49] GPfit_1.0-8         foreign_0.8-86      proxy_0.4-27       
 [52] Formula_1.2-5       lava_1.8.0          prodlim_2023.08.28 
 [55] heplots_1.6.2       glmnet_4.1-8        htmlwidgets_1.6.4  
 [58] RColorBrewer_1.1-3  mice_3.16.0         pkgconfig_2.0.3    
 [61] nnet_7.3-19         utf8_1.2.4          tidyselect_1.2.1   
 [64] rlang_1.1.3         DiceDesign_1.10     polynom_1.4-1      
 [67] munsell_0.5.1       tools_4.2.2         cli_3.6.2          
 [70] generics_0.1.3      evaluate_0.23       fastmap_1.1.1      
 [73] yaml_2.3.8          rgl_1.3.1           mitml_0.4-5        
 [76] future_1.33.2       ash_1.0-15          mvnfast_0.2.8      
 [79] xml2_1.3.6          compiler_4.2.2      rstudioapi_0.16.0  
 [82] ggokabeito_0.1.0    lhs_1.1.6           stringi_1.8.3      
 [85] Matrix_1.6-5        nloptr_2.0.3        vctrs_0.6.5        
 [88] pillar_1.9.0        lifecycle_1.0.4     furrr_0.3.1        
 [91] Rdpack_2.6          data.table_1.15.4   R6_2.5.1           
 [94] KernSmooth_2.23-22  gridExtra_2.3       parallelly_1.37.1  
 [97] codetools_0.2-20    boot_1.3-30         MASS_7.3-60.0.1    
[100] gtools_3.9.5        rprojroot_2.0.4     withr_3.0.1        
[103] parallel_4.2.2      hms_1.1.3           grid_4.2.2         
[106] timeDate_4032.109   gghalves_0.1.4      class_7.3-22       
[109] minqa_1.2.6         rmarkdown_2.26      carData_3.0-5      
[112] base64enc_0.1-3     ellipse_0.5.0      

The code chunk below contains a custom function that creates a new column indicating whether a value in a specified column is above, below, or in between, a specified standard deviation, and applies a label to each of the three categories.

Click here to view code.
# Function to classify numeric variable as categorically low, middle or high in distribution
# based on SD (value)
sd_calculate <- function(orig_column, value, options) {
  new_column <- case_when(
    {{ orig_column }} > mean({{ orig_column }}, na.rm = TRUE) + 
      value * sd({{ orig_column }}, na.rm = TRUE) ~ 
      options[3],
    {{ orig_column }} < mean({{ orig_column }}, na.rm = TRUE) - 
      value * sd({{ orig_column }}, na.rm = TRUE) ~ 
      options[1],
    TRUE ~ options[2]
  )
}

The code chunk below contains a custom function that creates a correlogram in which the values in the bottom diagonal of the correlogram represent the correlation coefficients, and the values in the upper diagonal represent the p-values.

Click here to view code.
correlogram_function <- function(in_data) {
  data_matrix <- as.matrix(in_data)

  pmat <- cor.mtest(data_matrix, method = "spearman")
  pvalues <- pmat$p
  pvalues[upper.tri(pvalues)] <- NA

  pvalues_long <- pvalues %>%
    as.data.frame() %>%
    tibble::rownames_to_column("Variable1") %>%
    gather("Variable2", "value", -Variable1) %>%
    mutate(
      value = round(value, digit = 2),
      value = case_when(Variable1 == Variable2 ~ NA, T ~ value)
    )

  test <- cor(data_matrix, method = "spearman")
  corr_coeff <- test
  corr_coeff[lower.tri(corr_coeff)] <- NA

  corr_coeff_long <- corr_coeff %>%
    as.data.frame() %>%
    tibble::rownames_to_column("Variable1") %>%
    gather("Variable2", "value", -Variable1) %>%
    mutate(
      value = round(value, digit = 2),
      value = case_when(Variable1 == Variable2 ~ NA, T ~ value),
      value = gsub("0.", ".", value)
    )

  correlogram_plot <- ggcorrplot(
    test,
    p.mat = pmat$p,
    sig.level = 0.05,
    insig = "pch",
    pch = "",
    method = "square",
    outline = T,
    type = "full",
    show.diag = T,
    ggtheme = ggplot2::theme_bw
  )

  x_labs <-
    ggplot_build(correlogram_plot)$layout$panel_params[[1]]$x$get_labels()

  correlogram_plot <- correlogram_plot +
    scale_y_discrete(limits = rev) +
    scale_x_discrete(position = "top", labels = x_labs) +
    theme(axis.text.x = element_text(angle = 60, hjust = -0.05)) +
    geom_text(
      aes(
        x = pvalues_long$Variable1,
        y = pvalues_long$Variable2,
        label = pvalues_long$value
      ),
      size = 2,
      fontface = "bold"
    ) +
    geom_text(
      aes(
        x = corr_coeff_long$Variable1,
        y = corr_coeff_long$Variable2,
        label = corr_coeff_long$value
      ),
      size = 2
    )

  correlogram_plot
}

3 Audio stimuli

3.1 Stimuli selection

This section discussions the selection of the audio stimuli used in the task. The stimuli were selected from a subset of the monologues in the QuakeBox corpus, specifically from Pākehā1 women aged 46-55. In selecting the stimuli from the monologues, at least half of the 10 monophthongs had to be present. Stimuli were also selected based on content; many, if not all, of the stories told in the corpus have upsetting aspects to them which we did not want to expose our participants to, especially as some of them may have experienced the earthquake themselves. As such, we focused on more positive parts of the recordings, like the sense of community or amusing aspects of their experiences, and, where this was not possible, we ensured that the clips were not explicitly negative (i.e., while some clips talk about damage to property, none talk about death or traumatic personal events). We also ensured that the clips did not contain information that would give an indication as to the speakers’ social backgrounds (e.g., occupation, specific suburbs, schools). The content of one stimulus was edited to remove references to her China plates (30). In some instances, inter-word pauses were also reduced to bring the length of the clip below 10 seconds. We also normalised the intensity of all clips to 70 dB in Praat (Boersma 2001) to ensure consistency for listeners.

3.2 Transcriptions and vowel stock take

Table 1 contains the transcriptions for all 38 audio stimuli used in the online task. The table indicates whether each of the 10 NZE monophthongs considered in Brand et al. (2021) is present in the stimuli (1 = present, 0 = absent).2 The table also counts the total number of monophthongs represented in each stimulus (e.g., a 5 indicates 5 of the 10 monophthongs are present in a stimulus) and the total number of stimuli in which each monophthong is represented. At least 6 of the 10 monophthongs are present in each stimuli (median = 8, mean = 8.13). The vowels from the leader-lagger continuum are more reliably represented than the vowels in the back-vowel configuration, largely due to the low occurrences of start across the stimuli.

Leader-Lagger
Back Vowel
SpeakerID Stimulus TRAP DRESS KIT FLEECE NURSE STRUT GOOSE LOT THOUGHT START Total
1 We tuned into the National Radio and the National Radio woman was fantastic she - she really deserves a - a bit of an award of how she handled it 1 0 1 1 1 0 1 1 1 0 7
2 Everybody has gone through a lot of here - lot of problems here in Christchurch and we need to keep going 1 1 1 1 1 0 1 1 0 0 7
3 Our older dog was a lot better actually because she's very deaf and quite blind so in fact that helped her a bit although obviously she knew something was up and she was outside, as well 1 1 1 1 1 1 1 1 1 0 9
4 like deadliest catch I thought I was out in the middle of the sea with great huge waves and the noise was tremendous 1 1 1 1 0 0 1 1 1 0 7
5 (the) only better thing I can think is the wonderful - relationships w- and people that I have come to know so much - so much better 1 1 1 1 0 1 1 0 0 0 6
6 we sort of found room for everybody and everybody crashed the night um I don't think anybody slept we all slept in our clothes 1 1 1 1 0 1 1 1 1 0 8
7 I was listening to National Radio and there was a wonderful um person there she was she was sort of making us all stay calm and things she was really good actually 1 0 1 1 1 1 1 1 1 1 9
8 I couldn't get there cos there were too many cars and no-one everyone was just crawling like snails 1 1 0 1 1 1 1 1 1 1 9
9 listening to the radio and all sitting round and talking so there were some good times and we had some laughs then as well 1 1 1 1 1 1 1 0 1 1 9
10 they brought a bit of cutlery a plate because of course there was no water and we all just got together and um had this enormous meal and it was you know it was kind of nice um in a way 1 1 1 1 0 1 1 1 1 0 8
11 I still have the people and the community um and one day those buildings'll be back in one form or another and in the meantime it's still a nice little community to be in 1 0 1 1 0 1 1 0 1 0 6
12 and i was trying really hard just to concentrate on the driving cos it was sort of the roads were starting to get quite sort of busy 1 1 1 1 1 1 1 1 1 1 10
13 we've gained friends that we didn't know, who've come together under earthquake circumstances 1 1 1 1 1 1 1 0 0 1 8
14 so at the end of it after two years we - we were in a very good position and we're both pretty happy with with what's happened for us 1 1 1 1 1 1 0 1 1 1 9
15 and I went to go out in my car to go and find him and I got told it's not worth going out because there was so much traffic, there was so much damage 1 1 1 0 1 1 1 1 0 1 8
16 I was with a group of friends and we had decided to leave at the end of the night and we walked out to shaking in the street 1 1 1 1 0 0 1 1 1 0 7
17 there were so many people who came and helped from uh workmates to family to people who you don't even know 1 1 0 1 1 0 1 1 0 1 7
18 that was the th~ amazing thing about the earthquakes is that people actually asked if you were okay people wanted to know if things were okay . 1 0 1 1 1 0 1 0 0 1 6
19 and in the end my dau~ daughter who is inc~ incredibly logical um just went to the phone book and tore out the back pages and we just slowly wor~ worked down the the list of things that we had to do . 1 1 1 1 1 1 1 1 1 0 9
20 was quite hard to comprehend that this has happened as we're driving along in lovely weather and life is normal 1 1 1 1 1 1 1 1 1 1 10
21 just immediately went into basically kiwi camping mode and I think that's what a lot of people did and it was just kind of second nature to lots of to to know how to do that 1 1 1 1 0 1 1 1 0 0 7
22 for the September earthquake we were snuggled in bed and um we did all the things that we'd been taught as as children 1 1 1 1 1 1 1 0 1 0 8
23 having said there were builders working on the building and they picked up all the children. The builders were great and they carried all the children into Latimer Square 1 1 1 1 1 1 0 1 1 0 8
24 my initial thought was that is was something like an alien invasion because I had a sense that there was a- you know movement coming up from underneath us 1 1 1 1 0 1 1 1 1 0 8
25 we found torches and a transistor radio, I turned on the transistor radio and there was um national program. on there was this really nice announcer and she just got information through from people 1 0 1 1 1 1 1 1 1 0 8
26 I think I've got to a point now you just sorta look up check everything's alright nothing's damaged no one's hurt and just get . back on with it again 1 1 1 1 1 1 1 1 1 0 9
27 I saw people walking past me so I was thinking I was thinking to myself well I'm gonna stop my car on the side of the road and walk because it's gonna be quicker . 1 1 1 1 0 0 1 1 1 1 8
28 the afternoon was spent ahh gathering regrouping ourselves checking on . neighbours who were ahh shocked 1 1 1 1 1 0 1 1 0 1 8
29 and here we have these four really nice council guys very big gentlemen . come over and here they are carrying this wedding dress across the road . struggling not to fall over . 1 1 1 1 0 1 1 1 1 1 9
30 um . I was seeing my best friend that I hadn't seen for three years she was coming over from Honolulu so they wanted a real she wanted a real Sunday roast she'd been living in America 1 1 1 1 0 1 1 1 1 0 8
31 I'm starting to feel . now some sense of . contentment and that it is my home again and it's got that feeling . of being home 1 1 1 1 0 1 1 1 0 1 8
32 the next you know th~ few hours after that that horrible feeling of feeling um . sick and shakey and . um . that sort of feeling was awful but anyway life went on 1 1 1 1 0 1 1 0 1 1 8
33 and all the cobble stones you know had to lift my feet because the cobblestones started . popping up . and I thought oh . that you know that might hurt if you got your foot . stuck or caught in that 1 0 1 1 1 1 1 1 1 1 9
34 I always thought that even if I had had to walk all that way home . you could have trusted people cos people were so kind . and everyone said are you okay where are you walking to - 1 1 1 1 0 1 1 1 1 1 9
35 so I managed to make nice coffee and some um . French toast and we found it c~ you know sorta felt quite guilty about having such delicious breakfast under such circumstances 1 1 1 1 1 1 1 1 1 0 9
36 we've got a really lovely um . street where it's more of a village . a c~ and a community and people . really . stuck together and helped each other 1 1 1 1 0 1 1 1 1 0 8
37 I stayed with my neighbours she cooked us food we all got together . it's funny when you don't know your neighbours that well . you bond pretty well - 1 1 1 1 0 1 1 1 1 0 8
38 and and also just caring for people people . are a lot friendlier in Christchurch . we find people . are doing more for each other . . 1 1 1 1 1 1 1 1 1 1 10
Total 38 32 36 37 22 30 36 31 29 18 NA
Table 1: Stimuli transcripts and vowel counts

4 Stimuli distribution across participants

This section details how the possible 703 pairwise combinations of the 38 stimuli were distributed across participants in the experiment design. Each participant listened to a subset of the possible combinations, because listening to all possible combinations would take multiple hours. Longer experiment times can reduce participant engagement (Galesic and Bosnjak 2009) and lead to more unreliable responses (e.g., Baer et al. 1997; Berry et al. 1992). As such, each participant listened to two ‘blocks’ of 19 stimuli. In each block, each speaker would be heard once. A simple random sampling of stimuli for each participant would have resulted in an uneven distribution of rating counts for each pair, with some pairs likely to never be rated at all. We instead used a semi-random sampling procedure that distributed the possible combinations of stimuli pairs as evenly as possible across the stimuli subsets participants heard. In effect, for approximately every 18 participants all 703 stimuli pairs are listened to once. We note that participant drop-out meant that, in practice, the overall distribution of the possible pairwise combinations was not as even as intended.

  1. An initial block of 19 stimuli pairs is created by randomly sorting all 38 stimuli, then assigning the first half of the list to the second half (i.e., the first stimulus would be paired with the twentieth stimulus, the second with the twenty-first). Each speaker is, therefore, only represented once in the block.

  2. The stimuli pairs included in this initial block are removed from the list of 703 possible pairwise combinations.

  3. The stimuli in the reduced list of available pairwise combinations are shuffled again. The next block of 19 pairs is selected from this shuffled list, with the condition that each stimulus is again represented only once in the block.

  4. The stimuli pairs in this next block of 19 pairs are then removed from the reduced list of possible pairwise combinations

  5. Steps 3 and 4 are repeated until as many of the possible 703 combinations have been assigned to as many complete lists of 19 stimuli as possible

  6. Steps 1 to 5 are repeated to create multiple ‘batches’, where each batch is a collection of unique blocks (of 19 stimuli pairs) that cover as many of the possible 703 pairwise combinations as possible.

    • While it is, hypothetically, possible for the 703 pairs to be evenly divided into 37 unique blocks of 19 stimuli pairs, the requirement that each speaker be only heard once per block meant that there was a range of 33 to 37 unique blocks (i.e., 627-703 stimuli pairs) per batch.
    • Nonetheless, at least 90% of possible pairs were accounted for in each batch, making it unlikely that the pairs not included in one batch are also not included in others.
  7. The blocks in the first half of each batch were then assigned to List A, and the blocks in the second half of each batch were assigned to List B. Each participant listens to a List A and List B block from the same batch.

    • In a case where a batch has 36 blocks, this means that all pairs in this batch would (in theory) be listened to and rated across 18 participants (2 blocks per listener). The order the audio stimuli were presented to individual listeners was then randomised, as was the order of the stimuli pairs.

5 Code for reported results

5.1 Fitting the reported MDS (Spline)

Like Principal Component Analysis, Multi-dimensional Scaling (MDS) is a dimension-reduction technique. While its implementation in sociolinguistics is comparatively limited (e.g., Clopper and Pisoni 2007; Clopper and Bradlow 2009), the technique has a strong precedent of application to questions of perceived similarity in disciplines such as psychology and forensic linguistics. This is because MDS is highly suited to (dis)similarity data; it reduces measures of (dis)similarity between pairs of objects (in our case, speakers) to a smaller number of perceptual dimensions. In theory, the resulting perceptual space corresponds to the cues or factors in production that listeners use to perceptually differentiate between speakers (e.g., pitch, vowel patterns).

MDS requires a dissimilarity matrix as input. We first generated a square pairwise 38 x 38 matrix (one row and one column per stimuli) which represents the similarity between each pair of speakers obtained from our experimental data (see Generate-data-frames.qmd). We scaled the pairwise similarity ratings per participant, and then took the mean rating for each stimuli pair across all participants. However, as MDS requires all numbers in the input matrix to be above 0, the individual scaled ratings were all brought above 0 by adding the minimum score to all ratings before calculating the mean. This mean-scaled similarity rating is the value in the similarity matrix. The similarity matrix was then converted to a dissimilarity matrix using the ‘reverse’ option from the sim2diss() function in the smacofR package (Mair, Groenen, and Leeuw 2022). Using this dissimilarity matrix, we implemented an mspline MDS, with five knots and third degree polynomials. This is one way to allow for a non-linear relationship between perceptual dissimilarity ratings and the resulting distances between speakers generated by MDS. There is no a priori reason for us to assume that this relationship in linear.

5.1.1 Selecting the number of dimensions

To apply MDS, we must specify the number of dimensions we wish the data to be reduced to. Here, we introduce a novel method to help inform the choice of dimensions. This method is an alternative to relying solely on individual stress values. Stress is a measure of the fit of an MDS, with a lower stress value corresponding to a better fitting analysis. However, stress will always decrease as the number of dimensions increases (i.e., 4 dimensions will always have a lower stress than 3, which will always have a lower stress than 2). As such, we do not consider relying on a single stress value to be best practice (following Mair, Borg, and Rusch 2016).

We instead take an approach to dimension selection which is analogous to the permutation and bootstrapping approach developed by Viera (2012) and applied to vocalic co-variation by Wilson Black et al. (2023). We implemented this method via another custom function in the nzilbb.vowels R package .

Our aim is to ensure that we do not specify a number of dimensions that is too low. To inform this decision, the function below compares the reductions in stress from the addition of a dimension to the MDS analysis in our data to that obtained by chance. Specifically, the function permutes the dissimilarity matrix (randomises the order of the values in each row) to break the association between speakers and calculates the reduction in stress achieved with each additional dimension. By repeating the process 100 times, we can approximate the distribution of stress reduction we would expect were there no structure in the data. We then compare this with a distribution of stress reduction we expect from the experimental data, generated by rerunning the analysis 100 times on subsets of the data (i.e., bootstrapping). We can, therefore, confirm that we have at least the number of dimensions which result in more stress reduction than by chance (i.e., in the permuted data).

We plot the results of this function in Figure 1 This figure plots the reduction in stress as the number of dimensions increased for both the bootstrapped and randomised/permuted data. Figure 1 suggests we need at least one dimension, with the black cross sitting somewhat higher than the null distribution for two dimensions. The test indicates we aren’t including too few dimensions if we run a two-dimensional MDS on this data.

We therefore run an MDS with two dimensions using the smacofSym() function from the smacofR package (Mair, Groenen, and Leeuw 2022). The space defined by these two dimensions, theoretically, correspond to the acoustic cues listeners used when grouping the speakers.

Click here to view code.
set.seed(10)
full_test <- mds_test(
  df_ID,
  n_boots = 100,
  n_perms = 100,
  test_dimensions = 5,
  mds_type = "mspline",
  spline_degree = 3,
  spline_int_knots = 5
)

plot_mds_test(full_test) +
  labs(
    y = "Reduction in Stress", 
    x = "Number of Dimensions in MDS Analysis", 
    colour = "Data"
  )

Figure 1: Boxplots depict stress reduction as additional dimensions are added for bootstrapped samples (red) and permuted samples (blue). Stress reduction in the experimental data is depicted by black crosses.

5.1.2 Selecting best-fitting analysis based on best random start

Following Mair, Borg, and Rusch (2016), the reported MDS in the manuscript is the random start solution with the lowest stress values from 100 random starts. The code chunk below runs 100 MDS analyses with random starts (all two-dimensional M-spline MDS analyses, with five knots and third-degree polynomials) on our dissimilarity matrix.

Click here to view code.
set.seed(200)
fit_df_ID <- NULL
for (i in 1:100) {
  fit_df_ID[[i]] <- smacofSym(
    df_ID,
    ndim = 2,
    type = "mspline",
    principal = T,
    init = "random",
    spline.degree = 3,
    spline.intKnots = 5,
    itmax = 2000
  )
}
ind <- which.min(sapply(fit_df_ID, function(x) {
  x$stress
}))
fit_df_ID <- fit_df_ID[[ind]]
fit_df_ID

Call:
smacofSym(delta = df_ID, ndim = 2, type = "mspline", init = "random", 
    principal = T, itmax = 2000, spline.degree = 3, spline.intKnots = 5)

Model: Symmetric SMACOF 
Number of objects: 38 
Stress-1 value: 0.31 
Number of iterations: 384 

5.1.3 Informal significance test assessing fit of final MDS

Also following Mair, Borg, and Rusch (2016), we apply an informal significance test in Figure 2 to gain insight into the fit of the reported MDS. The permtest function from the smacof package calculates the stress for 500 permuted iterations of the data frame, with the distribution of the permuted stress values visualised in Figure 2 The solid line then corresponds to the actual stress value of our reported MDS analysis, while the dotted line corresponds to the lower 5% quantile. As the actual stress value is lower than the dotted line, albeit marginally, it is lower than the 95% of the stress values from the analyses run on the permuted data (i.e., is informally equivalent to a p-value <0.05).

Click here to view code.
set.seed(100)
nrep <- 500
res.perm_PW <- permtest(
  fit_df_ID,
  data = df_ID,
  method.dat = "maximum",
  nrep = nrep,
  verbose = FALSE
)
res.perm_PW

Call: permtest.smacof(object = fit_df_ID, data = df_ID, method.dat = "maximum", 
    nrep = nrep, verbose = FALSE)

SMACOF Permutation Test
Number of objects: 38 
Number of replications (permutations): 500 

Observed stress value: 0.31 
p-value: 0.032 
Click here to view code.
mperm <-
  mean(res.perm_PW$stressvec) ## permutation stress norm

perm5 <-
  quantile(res.perm_PW$stressvec, probs = 0.05) ## lower 5% quantile (critical value)

hist(
  res.perm_PW$stressvec,
  xlim = c(0.10, 0.40),
  xlab = "Stress Values",
  main = "Permutation Histogram"
)
abline(v = perm5, lty = 2) ## critical value (dashed)
abline(v = fit_df_ID$stress)

Figure 2: Stress values from 500 permuted iterations of MDS with 5% quintile (dashed line) and actual stress value of final MDS (solid line)”

The insights from Figure 1 and Figure 2 provide us with sufficient evidence to support an MDS with two dimensions.

5.1.4 Extracting speaker scores

We can now extract the “coordinates” of each speaker from the results of the MDS and combine them with measures of speaker production from the QuakeBox.

Click here to view code.
conf_PW_ID <- fit_df_ID$conf
dimensions_PW <- as.data.frame(conf_PW_ID) %>%
  rename(D1_PW = V1, D2_PW = V2) %>%
  rownames_to_column(var = "Speaker")

# Join extracted scores with other data
measures_df <- measures_df %>%
  mutate(SpeakerId = as.character(SpeakerId)) %>%
  right_join(dimensions_PW, by = c("SpeakerId" = "Speaker"))

These coordinates capture each speaker’s position within the MDS space and can be mapped visually in two dimensions as in Figure 3 (Figure 1 in the manuscript). Stimuli that are close to one another (e.g., speaker 3 and 4) are perceived to sound more similar to one another, and stimuli that are far apart (e.g., speaker 30 and 22) are perceived to sound less similar to one another.

Click here to view code.
measures_df %>%
  ggplot(aes(x = D1_PW, y = D2_PW)) +
  geom_label(aes(label = SpeakerId), size = 3) +
  coord_fixed() +
  labs(x = "Dimension 1", y = "Dimension 2")

# Save figure 1
ggsave(
  path = here("Figures"),
  dpi = 300,
  filename = "Figure1.png",
  heigh = 950,
  width = 1150,
  units = "px"
)

Figure 3: MDS output coordinates for each speaker mapped in two dimensions

5.2 Predicting MDS dimensions with regression trees

To explore the features in production that distinguish between speakers within the MDS space, we apply two regression trees with stimuli D1 and D2 scores as the dependent variables and the following independent variables: speakers’ scores that measure their position in the leader-lagger continuum and back-vowel configuration (generated in Hurring et al. Accepted), and stimuli articulation rate and mean pitch measures. All independent variables were scaled to facilitate comparability.

Unlike pairwise correlations (e.g. in Nolan, McDougall, and Hudson 2011; McDougall 2013) or linear regression, regression trees allow us to both explore the role of multiple variables at once in predicting D1 and D2 and capture potential non-linear relationships between variables.

We fit the regression trees using the parsnip package (Kuhn and Vaughn 2024) in R with the mode set to regression and the engine set to Rpart (Thernau, Atkinson, and Ripley 2023). To mitigate the high variance and instability of regression trees, we took a conservative approach to fitting the two regression trees. There had to be an increase of at least 0.05 to the cost/complexity parameter for a split to occur. We also implemented random forests to evaluate the importance of each predictor in predicting each dimension. The random forests were also fit using parsnip in R with the mode set to regression, the number of trees set to 1000 and the engine set to Ranger (Wright and Ziegler 2017).

5.2.1 Predicting D1 with a regression tree

Figure 4 shows the reported result of the regression tree fit with Dimension 1 scores as the dependent variable. Regression trees partition data sets into smaller groups (‘nodes’) and then fit a model for each node based on the specified independent variables. The tree then creates an if-else statement for the most important predictor at each node, partitioning the data into further nodes until specified thresholds are met.

Click here to view code.
# Rename some variables for convenience
regression_df <- measures_df %>%
  select(
    LeaderLaggerScore,
    BackVowelScore,
    ArticRate,
    MeanPitch,
    D1_PW,
    D2_PW
  )

# Seed for reproducibility
set.seed(6)

# Set specifications for regression tree
tree_spec <-
  # Set model type
  decision_tree() %>%
  # Set mode
  set_mode("regression") %>%
  # Specify engine
  set_engine("rpart",
    control = rpart.control(cp = 0.05),
    method = "anova"
  )

# Fit tree predicting D1
d1_fit <- tree_spec %>%
  fit(
    D1_PW ~
      LeaderLaggerScore +
      BackVowelScore +
      ArticRate +
      MeanPitch,
    data = regression_df
  )

# Visualise output of D1 ree
d1_fit %>%
  extract_fit_engine() %>%
  rpart.plot(
    roundint = FALSE,
    extra = 101,
    box.palette = c("#f0496a", "#a541f7", "#A13F8A", "#F9A742", "#ffd117")
  )

Figure 4: Output of regression tree predicting Dimension 1

We can see, therefore, that mean pitch is the first node in the tree predicting D1. Speakers whose (scaled) mean pitch is above or equal to 0.96 (n = 8) are estimated to have a lower D1 score. Within the lower pitch speakers, laggers in change (n = 15) are estimated to have a lower D1 than leaders (n = 15).

The next code chunk saves the plot.

Click here to view code.
ppi <- 300
png(
  here(
    "Figures",
    "D1RegressionTree2.png"
  ),
  width = 1900,
  height = 1350,
  res = ppi
)
d1_fit %>%
  extract_fit_engine() %>%
  rpart.plot(
    roundint = FALSE,
    extra = 101,
    box.palette = c("#f0496a", "#a541f7", "#A13F8A", "#F9A742", "#ffd117"),
    cex = 1.25
  )
dev.off()
png 
  2 

Usefully for us, the if-else statements at each node function as ‘cutoffs’ that should delimit groups of speakers within the MDS space (i.e., speakers above a certain cutoff should be concentrated in a similar area of the MDS perceptual space). As such, Figure 5 maps the cutoffs from the tree onto the perceptual similarity space in Figure 3. Indeed, higher pitch speakers are concentrated in the top left (i.e., lower D1 scores) in red, with the (low-pitch) laggers also concentrated in the left, and (low-pitch) leaders to the right of the space.

Click here to view code.
pitch_PC1_ellipses <- regression_df %>%
  mutate(
    PC1_tree = case_when(LeaderLaggerScore >= -0.065 ~ "Leader", T ~ "Lagger"),
    pitch_tree = case_when(MeanPitch >= 0.96 ~ "High", T ~ "Low"),
    PC1_pitch_tree = case_when(
      pitch_tree == "High" ~ "High",
      pitch_tree != "High" &
        PC1_tree == "Lagger" ~ "LaggerLow",
      T ~ "LeaderLow"
    )
  ) %>%
  ggplot(aes(y = D2_PW, x = D1_PW, group = PC1_pitch_tree)) +
  geom_encircle(
    data = . %>% filter(PC1_pitch_tree == "High"),
    fill = "#f0496a",
    alpha = 0.4
  ) +
  geom_encircle(
    data = . %>% filter(
      PC1_pitch_tree == "LaggerLow",
      D1_PW < 1,
      D2_PW < (0.75),
      !(D1_PW > (0) & D2_PW < 0)
    ),
    fill = "#ba16fe",
    alpha = 0.4
  ) +
  geom_encircle(
    data = . %>% filter(PC1_pitch_tree == "LeaderLow", !(D1_PW < (0.25) &
      D2_PW > 0)),
    fill = "#ffd117",
    alpha = 0.4
  ) +
  geom_point(
    aes(fill = PC1_pitch_tree, shape = PC1_tree),
    colour = "black",
    size = 5
  ) +
  scale_fill_manual(values = c("#f0496a", "#ba16fe", "#ffd117")) +
  scale_colour_manual(values = c("#f0496a", "#ba16fe", "#ffd117")) +
  scale_shape_manual(values = c(21, 24, 23, 25)) +
  labs(
    title = "Leaders and laggers by pitch",
    x = "Dimension 1",
    y = "Dimension 2",
    fill = "LL/Pitch",
    shape = "Pitch"
  ) +
  guides(fill = guide_legend(override.aes = list(shape = 21))) +
  coord_fixed() +
  theme(legend.position = "bottom")

pitch_PC1_ellipses

# Save figure
ggsave(
  path = here("Figures"),
  dpi = 300,
  filename = "pitch_PC1_ellipses.png",
  heigh = 1650,
  width = 1800,
  units = "px"
)

Figure 5: Visualisation of the MDS output showing leaders as distinct from slow and fast laggers

5.2.1.1 Evaluating the stability of D1 predictor importance with random forests

This section applies random forests to evaluate the stability of our independent variables in predicting D1. The next code chunk uses rand_forest to define a model that creates a large number of decision trees, each independent of the others. The final prediction uses all predictions from the individual trees and combines them. We can then extract from the combined trees the estimated ‘importance’ of our four independent variables in predicting the dependent variable.

Click here to view code.
# Random forest settings
set.seed(9)

random_f <-
  rand_forest(trees = 1000) %>%
  set_engine("ranger",
    importance = "permutation",
    alpha = 0.05
  ) %>%
  set_mode("regression")

d1_rf_fit <-
  random_f %>%
  fit(
    D1_PW ~
      LeaderLaggerScore +
      BackVowelScore +
      ArticRate +
      MeanPitch,
    data = regression_df
  )

Figure 6 shows the extracted estimates, ranked by their importance in predicting D1. The importance value on the horizontal axis corresponds to whether a variable has a positive effect on the predictor performance of the tree (i.e., a positive value indicates more frequent use of a variable in making key decisions within the forest of decision trees, and greater improvements to model fit when used in these decisions). These measures reflect the importance of individual variables and are not designed to capture interactions (see Wright, Ziegler, and König 2016). Figure 6 upholds the relative importance of mean pitch and the leader-lagger continuum to predicting D1, with the back-vowel score least important to predicting a speaker’s D1 score.

Click here to view code.
d1_rf_fit %>%
  extract_fit_engine() %>%
  vip() +
  theme_bw(base_size = 12) +
  labs(y = "Importance (Random Forest)")

Figure 6: Random forests predicting D1

5.2.2 Predicting D2 with a regression tree

Figure 7 shows the reported result of the regression tree fit predicting Dimension 2. Here, articulation rate is represented in the first node of the tree. Slower speakers are estimated to have a lower D2 score (n = 15). Within the faster speakers, higher pitch speakers (n = 7) are estimated to have a higher D2 than lower pitch speakers (n = 16).

Click here to view code.
set.seed(6)

d2_fit <-
  tree_spec %>% fit(
    D2_PW ~
      LeaderLaggerScore +
      BackVowelScore +
      ArticRate +
      MeanPitch,
    data = regression_df
  )

d2_fit %>%
  extract_fit_engine() %>%
  rpart.plot(
    roundint = FALSE,
    extra = 101,
    box.palette = c(
      "#09c9c3",
      "#277EB2",
      "#F9A742",
      "#F87233",
      "#F87233",
      "#f0496a"
    ),
    xflip = T
  )

Figure 7: Output of regression tree predicting Dimension 2

The next code chunk saves Figure 7.

Click here to view code.
ppi <- 300

png(
  here("Figures", "D2RegressionTree2.png"),
  #  width = 1950,
  #  height = 1000,
  width = 1400,
  height = 1400,
  res = ppi
)

d2_fit %>%
  extract_fit_engine() %>%
  rpart.plot(
    roundint = FALSE,
    extra = 101,
    box.palette = c(
      "#09c9c3",
      "#277EB2",
      "#F9A742",
      "#F87233",
      "#F87233",
      "#f0496a"
    ),
    cex = 1,
    xflip = T
  )
dev.off()
png 
  2 

Figure 8 then maps the cutoffs from the tree onto the perceptual similarity space in Figure 3. We can see that slower speakers are concentrated in the bottom of the space (blue), with fast and high pitch speakers concentrated in the top left (red). Fast and lower pitch speakers are then concentrated in the middle in orange.

Click here to view code.
speed_pitch_ellipses <- regression_df %>%
  mutate(
    PC1_tree = case_when(LeaderLaggerScore >= -0.065 ~ "Leader", T ~ "Lagger"),
    speed_tree = case_when(ArticRate < -0.27 ~ "Slow", T ~ "Fast"),
    speed_tree2 = case_when(
      ArticRate < -0.27 ~ "Slow", 
      ArticRate > 0.17 ~ "Fast", 
      T ~ "Middle"
    ),
    pitch_tree = case_when(MeanPitch < 0.31 ~ "Low", T ~ "High"),
    speed_pitch_tree = case_when(
      speed_tree == "Slow" ~ "Slow",
      speed_tree == "Fast" & pitch_tree == "Low" ~ "FastLow",
      T ~ "FastHigh"
    ),
    speed_pitch_tree2 = case_when(
      speed_tree == "Slow" &
        pitch_tree == "Low" ~ "SlowLow",
      speed_tree == "Slow" &
        pitch_tree == "High" ~ "SlowHigh",
      speed_tree == "Fast" & pitch_tree == "Low" ~ "FastLow",
      T ~ "FastHigh"
    )
  ) %>%
  ggplot(aes(y = D2_PW, x = D1_PW)) +
  geom_encircle(
    data = . %>% filter(speed_pitch_tree == "FastHigh"),
    fill = "#f0496a",
    alpha = 0.4
  ) +
  geom_encircle(
    data = . %>% filter(speed_pitch_tree == "FastLow", D1_PW < 1, D1_PW > (-0.5)),
    fill = "#F9A742",
    alpha = 0.4
  ) +
  geom_encircle(
    data = . %>% filter(speed_pitch_tree == "Slow", D1_PW < (0.1), D2_PW < 0.25),
    fill = "#09c9c3",
    alpha = 0.4
  ) +
  geom_encircle(
    data = . %>% filter(speed_pitch_tree == "Slow", D1_PW > (-0.1), D2_PW < 0.25),
    fill = "#09c9c3",
    alpha = 0.4
  ) +
  geom_point(
    aes(fill = speed_pitch_tree, shape = pitch_tree),
    colour = "black",
    size = 5
  ) +
  labs(
    title = "Speed and pitch",
    fill = "Speed/Pitch",
    x = "Dimension 1",
    y = "Dimension 2",
    shape = "Pitch"
  ) +
  scale_fill_manual(values = c("#f0496a", "#F9A742", "#09c9c3")) +
  scale_colour_manual(values = c("#f0496a", "#F9A742", "#09c9c3")) +
  scale_shape_manual(values = c(25, 22)) +
  theme(legend.position = "bottom") +
  coord_fixed() +
  guides(fill = guide_legend(override.aes = list(shape = 21)))

speed_pitch_ellipses

Figure 8: Visualisation of the MDS output showing slow speakers as distinct from high and fast speakers

The next code chunk saves Figure 8.

Click here to view code.
ggsave(
  path = here("Figures"),
  dpi = 300,
  filename = "speed_pitch_ellipses.png",
  heigh = 1650,
  width = 1800,
  units = "px"
)

5.2.2.1 Evaluate stability of D2 predictor importance with random forests

Figure 9 shows the results of the random forest predicting Dimension 2. The relative importance of articulation rate and mean pitch are upheld, and the leader-lagger continuum is the least important predictor of D2 scores. The random forest also, however, points to a positive impact of the Back Vowel Configuration (BVC) on predicting Dimension 2, albeit one that is much less importance than articulation rate and pitch. It is, therefore, possible that listeners differentiate between speakers based on the BVC if speed and pitch are controlled for. The evidence here points, minimally, to speed and pitch being more sysemtically perceptually salient to listeners than their covarying back vowel patterns.

Click here to view code.
set.seed(6)

rf_fit2 <-
  random_f %>%
  fit(
    D2_PW ~
      LeaderLaggerScore +
      BackVowelScore +
      ArticRate +
      MeanPitch,
    data = regression_df
  )

rf_fit2 %>%
  extract_fit_engine() %>%
  vip() +
  theme_bw(base_size = 12) +
  labs(y = "Importance (Random Forest)")

Figure 9: Random forests predicting D2

This is supported by Figure 10, which shows the perceptual space based on speakers’ BVC scores. As BVC scores did not emerge in the regression tree, we do not have an obvious cutoff for differentiating between speakers. As such, we have marked speakers whose back vowel configuration scores are 0.5 standard deviation above and below the mean, and in the middle.

Speakers with more negative BVC scores are concentrated in the bottom half of the space, in particular in the bottom right. However, there is limited difference between speakers in the middle and with more positive BVC scores, and there is an overlap between speakers with Negative BVC scores and some of the slow leaders. There is also a strong overlap with articulation rate, where most of the speakers with middle/negative BVC scores are also slower (only 3 of the 14 slower speakers have a strongly positive BVC score). It is, consequently, difficult to disentangle whether it is really is speakers’ BVC scores that emerge as important in Figure 9 or it is a by product of the importance of articulation rate differentiating between speakers along Dimension 2.

Click here to view code.
regression_df %>%
  mutate(
    PC1_tree = case_when(LeaderLaggerScore >= -0.065 ~ "Leader", T ~ "Lagger"),
    speed_tree = case_when(ArticRate < -0.27 ~ "Slow", T ~ "Fast"),
    PC2_cat = case_when(BackVowelScore > 0 ~ "Positive", T ~ "Negative"),
    PC2_cat2 = sd_calculate(
      BackVowelScore, 0.5, c("Negative", "Middle", "Positive")
    ),
    pitch_tree = case_when(MeanPitch < 0.31 ~ "Low", T ~ "High")
  ) %>%
  ggplot(aes(y = D2_PW, D1_PW)) +
  geom_point(aes(fill = PC2_cat2, shape = speed_tree),
    colour = "black",
    size = 5
  ) +
  labs(
    title = "Speed and Back-Vowel Configuration",
    fill = "BVC",
    x = "Dimension 1",
    y = "Dimension 2",
    shape = "Artic Rate"
  ) +
  scale_fill_manual(values = c("white", "#fdd167", "#e53d44")) +
  scale_shape_manual(values = c(21, 22)) +
  geom_hline(yintercept = 0) +
  coord_fixed() +
  guides(fill = guide_legend(override.aes = list(shape = 21, 22)))

Figure 10: Speakers with negative and positive Back-Vowel Configuration Scores

The collective results indicate that it remains possible listeners can hear differences in speakers’ back vowels. However, there is not a clear interpretation of this within the MDS space, and the BVC does not reliably differentiate between speakers to the same extent as their speed, pitch and leader-lagger scores.

5.2.3 Visualising regression tree outputs in two-dimensions

Figure 11 combines the outputs from the analyses of Dimension 1 and Dimension 2 in an interpretation of the overall space. We can discern five main groups of speakers based on the cutoffs for speed, pitch, and the leader-lagger continuum:

  1. Slower and/or lower pitched leaders (yellow, bottom right)

  2. Slower and/or lower pitch laggers (purple, bottom left).

  3. Leaders who are both faster and lower pitched (orange, middle).

  4. Higher pitched speakers,regardless of whether they are a leader or lagger (red, top right).

  5. Leaders who are fast and/or higher pitched (dark orange, top).

While there is some overlap between the different groups, there is nonetheless evidence for listeners making subtle perceptual distinctions between speakers based on all three variables. The MDS therefore points to speed and pitch, and one of the covarying NZE vowel patterns, underlying the perceptual relationships between speakers.

Click here to view code.
main_groups <- regression_df %>%
  mutate(
    PC1_tree = case_when(LeaderLaggerScore > -0.065 ~ "Leader", T ~ "Lagger"),
    speed_tree = case_when(ArticRate < -0.27 ~ "Slow", T ~ "Fast"),
    pitch_tree = case_when(MeanPitch <= 0.96 ~ "Low", T ~ "High"),
    pitch_tree2 = case_when(MeanPitch < 0.31 ~ "Low", T ~ "High"),
    speed_pitch_tree = case_when(
      speed_tree == "Slow" ~ "Slow",
      speed_tree == "Fast" & pitch_tree == "Low" ~ "FastLow",
      T ~ "FastHigh"
    ),
    speed_pitch_tree2 = case_when(
      speed_tree == "Slow" &
        pitch_tree2 == "Low" ~ "SlowLow",
      speed_tree == "Slow" &
        pitch_tree2 == "High" ~ "SlowHigh",
      speed_tree == "Fast" & pitch_tree2 == "Low" ~ "FastLow",
      T ~ "FastHigh"
    ),
    SlowOrLow = case_when(
      PC1_tree == "Leader" &
        (pitch_tree2 == "Low" |
          speed_tree == "Slow") ~ "SlowOrLowLeader",
      PC1_tree == "Lagger" &
        (pitch_tree2 == "Low" |
          speed_tree == "Slow") ~ "SlowOrLowLagger",
      T ~ "NotSlowOrLow"
    ),
    SlowAndLow = case_when(
      PC1_tree == "Leader" &
        (pitch_tree2 == "Low" &
          speed_tree == "Slow") ~ "SlowAndLowLeader",
      PC1_tree == "Lagger" &
        (pitch_tree2 == "Low" &
          speed_tree == "Slow") ~ "SlowAndLowLagger",
      T ~ "NotSlowAndLow"
    ),
    FastOrHigh = case_when(
      PC1_tree == "Leader" &
        (pitch_tree2 == "High" |
          speed_tree == "Fast") ~ "FastOrHighLeader",
      PC1_tree == "Lagger" &
        (pitch_tree2 == "High" |
          speed_tree == "Fast") ~ "FastOrHighLagger",
      T ~ "NotFastOrHigh"
    ),
    LowAndFast = case_when(
      PC1_tree == "Leader" &
        speed_pitch_tree2 == "FastLow" ~ "FastAndLowLeader",
      PC1_tree == "Lagger" &
        speed_pitch_tree2 == "FastLow" ~ "FastAndLowLagger",
      T ~ "NotFastAndLow"
    ),
    Main_Groups = case_when(
      LowAndFast == "FastAndLowLeader" &
        (D1_PW < 0.6) ~ "Fast and low Leaders",
      SlowOrLow == "SlowOrLowLeader" &
        (D1_PW > -0.1) ~ "Slow/low Leaders",
      (pitch_tree2 == "High" |
        speed_tree == "Fast") &
        PC1_tree == "Lagger" & D2_PW > 0.21 ~ "Fast/high Laggers",
      pitch_tree2 == "High" ~ "High Laggers and Leaders",
      ((pitch_tree2 == "Low" |
        speed_tree == "Slow") &
        PC1_tree == "Lagger" & D2_PW < 0.21
      ) ~ "Slow/low Laggers"
    )
  ) %>%
  ggplot(aes(y = D2_PW, x = D1_PW)) +
    geom_point(aes(shape = Main_Groups, fill = Main_Groups),
      size = 3,
      alpha = 0.01
    ) +
    geom_encircle(
      data = . %>% filter(Main_Groups == "Slow/low Leaders"),
      colour = "black",
      fill = "#ffd117",
      alpha = 0.3
    ) +
    geom_encircle(
      data = . %>% filter(
        Main_Groups == "Slow/low Laggers" |
          (pitch_tree2 == "High" &
            PC1_tree == "Lagger"),
        D2_PW < 0
      ),
      colour = "black",
      fill = "#a541f7",
      alpha = 0.3
    ) +
    geom_encircle(
      data = . %>% filter(pitch_tree2 == "High", D1_PW < 0.15),
      colour = "black",
      fill = "#f0496a",
      alpha = 0.3
    ) +
    geom_encircle(
      data = . %>% filter(Main_Groups == "Fast/high Laggers", D2_PW > 0.21),
      colour = "black",
      fill = "#F87233",
      alpha = 0.3
    ) +
    geom_encircle(
      data = . %>% filter(Main_Groups == "Fast and low Leaders", D1_PW < 0.6),
      colour = "black",
      fill = "#F9A742",
      alpha = 0.3
    ) +
    geom_point(
      data = . %>% filter(Main_Groups == "High Laggers and Leaders"),
      alpha = 0.85,
      size = 7,
      fill = "#f0496a",
      colour = "black",
      shape = 23
    ) +
    geom_point(
      data = . %>% filter(Main_Groups == "Fast/high Laggers", D2_PW > 0.21),
      alpha = 0.85,
      size = 6,
      fill = "#F87233",
      colour = "black",
      shape = 21
    ) +
    geom_point(
      data = . %>% filter(Main_Groups == "Slow/low Laggers", D2_PW < 0.21),
      alpha = 0.85,
      size = 6,
      fill = "#a541f7",
      colour = "black",
      shape = 22
    ) +
    geom_point(
      data = . %>% filter(Main_Groups == "Slow/low Leaders"),
      alpha = 0.85,
      size = 6,
      fill = "#ffd117",
      colour = "black",
      shape = 25
    ) +
    geom_point(
      data = . %>% filter(Main_Groups == "Fast and low Leaders"),
      alpha = 0.85,
      size = 6,
      fill = "#F9A742",
      colour = "black",
      shape = 24
    ) +
    labs(
      fill = "Main Groups",
      x = "Dimension 1",
      y = "Dimension 2",
      shape = "Main Groups"
    ) +
    annotate(
      "label",
      label = c("Leaders"),
      x = 0.1,
      y = 0.1,
      colour = "black"
    ) +
    annotate(
      "label",
      label = c("Fast and low"),
      x = 0.1,
      y = -0.1,
      fill = "#F9A742"
    ) +
    annotate(
      "label",
      label = c("Leaders"),
      x = 0.6,
      y = -0.15,
      colour = "black"
    ) +
    annotate(
      "label",
      label = c("Slow and/or low"),
      x = 0.6,
      y = -0.35,
      fill = "#ffd117"
    ) +
    annotate(
      "label",
      label = c("Laggers"),
      x = -0.25,
      y = -0.35,
      colour = "black"
    ) +
    annotate(
      "label",
      label = c("Slow and/or low"),
      x = -0.25,
      y = -0.55,
      fill = "#a541f7"
    ) +
    annotate(
      "label",
      label = c("Leaders+Laggers"),
      x = -0.4,
      y = 0.35,
      colour = "black"
    ) +
    annotate(
      "label",
      label = c("High"),
      x = -0.4,
      y = 0.15,
      fill = "#f0496a"
    ) +
    annotate(
      "label",
      label = c("Laggers"),
      x = 0.25,
      y = 0.75,
      colour = "black"
    ) +
    annotate(
      "label",
      label = c("Fast and/or high"),
      x = 0.25,
      y = 0.55,
      fill = "#F87233"
    ) +
    scale_fill_manual(
      values = c("#F9A742", "#F87233", "#f0496a", "#a541f7", "#ffd117"),
      guide = guide_legend(override.aes = list(
        shape = c(24, 21, 23, 22, 25),
        fill = c("#F9A742", "#F87233", "#f0496a", "#a541f7", "#ffd117"),
        alpha = 1,
        size = 6
      ))
    ) +
    scale_shape_manual(values = c(24, 21, 23, 22, 25)) +
    theme_bw(base_size = 14) +
    coord_fixed() +
    theme(
      legend.position = "right",
      legend.key.size = unit(1.5, "cm"),
      # change legend key size
      legend.key.height = unit(1.5, "cm"),
      # change legend key height
      legend.key.width = unit(1.5, "cm"),
      # change legend key width
      legend.title = element_text(size = 14),
      # change legend title font size
      legend.text = element_text(size = 12)
      # change legend text font size)
    )
main_groups

# Save the image
ggsave(
  path = here("Figures"),
  dpi = 300,
  filename = "MainGroups.png",
  width = 3200,
  height = 1650,
  units = "px"
)

Figure 11: Main production groups in perceptual space

The next two code chunks combine, and add text to, the previous figures in this section (the D1 and D2 regression tree outputs, their mapping onto the MDS space, and the interpretation of the overall space). The output is shown in Figure 12 which is Figure 2 in the manuscript.

Click here to view code.
# Load regression tree predicting D1
# Annotate each node with speaker group
image_D1 <-
  image_read(
    here(
      "Figures",
      "D1RegressionTree2.png"
    )
  ) %>%
  image_annotate(
    "Estimated D1",
    size = 75,
    location = "+575+5",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "High pitch",
    size = 60,
    location = "+175+600",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "Lower pitch",
    size = 60,
    location = "+1200+450",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "Lower pitch Laggers",
    size = 60,
    #  gravity = "1",
    location = "+700+900",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "Lower pitch Leaders",
    size = 60,
    location = "+1350+900",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  )

# Load regression tree predicting D2
# Annotate with main speaker groups
# Unrotated
image_D2 <-
  image_read(here("Figures", "D2RegressionTree2.png")) %>%
  image_annotate(
    "Estimated D2",
    size = 75,
    location = "+575+5",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "Faster high pitch",
    size = 60,
    location = "+25+900",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "Faster speakers",
    size = 60,
    location = "+300+450",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "Slower speakers",
    size = 60,
    location = "+900+650",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "Faster low pitch",
    size = 60,
    location = "+525+900",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  )

# Load regression tree predicting D2 # Annotate with main speaker groups
# Rotated
D2_rotate <-
  image_read(here("Figures", "D2RegressionTree2.png")) %>%
  image_annotate(
    "Estimated D2",
    size = 75,
    location = "+575+5",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_rotate(degrees = 90) %>%
  image_annotate(
    "Faster high pitch",
    size = 60,
    location = "+400+100",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "Slower speakers",
    size = 60,
    location = "+500+1200",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "Faster low pitch",
    size = 60,
    location = "+400+750",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  )

# Save annotated .png files
image_write(
  image_D1,
  path = here(
    "Figures",
    "D1RegressionTreeAnnotated.png"
  ),
  format = "png"
)

image_write(
  image_D2,
  path = here(
    "Figures",
    "D2RegressionTreeAnnotated.png"
  ),
  format = "png"
)

image_write(
  D2_rotate,
  path = here(
    "Figures",
    "D2RegressionTreeAnnotatedRotated.png"
  ),
  format = "png"
)
Click here to view code.
# Load interpretation of full MDS space
maingroups <-
  image_read(
    here(
      "Figures",
      "MainGroups.png"
    )
  )

# Load plot of output of regression tree predicting D1
D1RegressionTree <-
  image_read(
    here(
      "Figures",
      "D1RegressionTreeAnnotated.png"
    )
  )

# Load plot of output of regression tree predicting D2
D2RegressionTree <-
  image_read(
    here(
      "Figures",
      "D2RegressionTreeAnnotatedRotated.png"
    )
  )

# Load MDS space with cutoffs from D1 regression tree mapped
D1MDS <-
  image_read(
    here(
      "Figures",
      "pitch_PC1_ellipses.png"
    )
  )

# Load MDS space with cutoffs from D2 regression tree mapped
D2MDS <-
  image_read(
    here(
      "Figures",
      "speed_pitch_ellipses.png"
    )
  )

# Load blank image (used to create balance in final image)
blank <- image_read(
  here(
    "Figures",
    "Blank.png"
  )
)

# Crop blank space to add top of interpretation of full space
main_top <- image_crop(blank, geometry = "3400x200")

# Add black space to top of interpretation of full space
to_append_main <- c(main_top, maingroups)
maingroups <- image_append(to_append_main, stack = T)

# Add a black border
maingroups <-
  image_border(maingroups, color = "#000000", geometry = "15x15")

# Crop blank space to add to top and bottom of D1 regression tree and MDS space
D1_top <- image_crop(blank, geometry = "1900x350")

# Combine blank spaces with D1 regression tree and MDS space

to_append_D1 <- c(D1_top, D1RegressionTree, D1MDS, D1_top)

D1_appended <- image_append(to_append_D1, stack = T)

# Add white and black borders
D1_appended <-
  image_border(D1_appended, color = "#ffffff", geometry = "30x15")

D1_appended <-
  image_border(D1_appended, color = "#000000", geometry = "15x15")

# Crop blank space to add to D2 regression tree and MDS space

D2_top <- image_crop(blank, geometry = "3400x200")

# Combine blank space with D2 regression tree and MDS space figures
to_append_D2 <- c(D2MDS, D2RegressionTree)

D2_appended <- image_append(to_append_D2, stack = F)

to_append_D2_2 <- c(D2_top, D2_appended)

D2_appended_2 <- image_append(to_append_D2_2, stack = T)

# Add a black border
D2_appended_2 <-
  image_border(D2_appended_2, color = "#000000", geometry = "15x15")

# Combine D2 with main groups image
to_append_MainD2 <- c(D2_appended_2, maingroups)

MainD2_appended <- image_append(to_append_MainD2, stack = T)

# Combine D1 figures with D2 figures and main group figure
toappend_D1_MainD2 <- c(D1_appended, MainD2_appended)

D1_MainD2_appended <- image_append(toappend_D1_MainD2, stack = F)

# Add black border
D1_MainD2_appended <-
  image_border(D1_MainD2_appended,
    color = "#000000",
    geometry = "15x15"
  )

# Annotate full image with additional text
D1_MainD2_appended <- D1_MainD2_appended %>%
  image_annotate(
    "A: Analysis of D1",
    size = 150,
    location = "+100+50",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "B: Analysis of D2",
    size = 150,
    location = "+2100+50",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  ) %>%
  image_annotate(
    "C: Combined interpretation of D1 and D2",
    size = 150,
    location = "+2100+1950",
    color = "black",
    degrees = 0,
    boxcolor = "white"
  )
D1_MainD2_appended

# save interpretation
image_write(
  D1_MainD2_appended,
  path = here("Figures", "D1_MainD2_appended.png"),
  format = "png",
  density = 600
)

Figure 12: Figure 2 from the manuscript. (A) The results of the regression tree for Dimension 1, (B) The results of the regression tree for Dimension 2, and (C) the interpretation of the perceptual space based on (A) and (B)

5.3 Predicting pairwise similarity ratings with a GAMM

This section contains the model formula and diagnostics for the generalised additive mixed model reported in the main manuscript.

5.3.1 Data wrangling

The next code chunk creates a data frame with unique rows for the speed, pitch, leader-lagger score and back-vowel configuration score for the first and second stimulus in each stimuli pair that participants rated.

Click here to view code.
# Select relevant variables
variables <-
  c(
    "workerId",
    "Stimuli1_anon",
    "Stimuli2_anon",
    "enteredResponse",
    "Stimuli_anon",
    "scaledResponse",
    "ReScaledResponse",
    "pair_id_ordered",
    "pair_id_unordered",
    "LeaderLaggerScore",
    "BackVowelScore",
    "ArticRate",
    "MeanPitch",
    "count"
  )

variables2 <-
  c(
    "workerId",
    "Stimuli1_anon",
    "Stimuli2_anon",
    "enteredResponse",
    "Stimuli_anon",
    "scaledResponse",
    "ReScaledResponse",
    "pair_id_ordered",
    "pair_id_unordered",
    "count"
  )

PW_ratings_ID <- PW_ratings_ID %>%
  mutate(
    Stimuli1ID = as.character(Stimuli1ID),
    Stimuli2ID = as.character(Stimuli2ID),
    count = as.character(count)
  )

PW_ratings_1 <- PW_ratings_ID %>%
  left_join(measures_df, by = c("Stimuli1ID" = "SpeakerId")) %>%
  select(all_of(variables)) %>%
  rename_with(
    ~ paste("Stim1", .x, sep = "_"), 
    contains(
      c(
        "artic",
        "pitch",
        "_PW",
        "Score"
      )
    )
  ) %>%
  select(
    all_of(variables2),
    pair_id_unordered,
    contains("Stim1"),
    contains("_PW")
  )

PW_ratings_2 <- PW_ratings_ID %>%
  left_join(measures_df, by = c("Stimuli2ID" = "SpeakerId")) %>%
  select(all_of(variables)) %>%
  rename_with(~ paste("Stim2", .x, sep = "_"), contains(
    c(
      "artic",
      "pitch",
      "_PW",
      "Score"
    )
  )) %>%
  select(
    all_of(variables2),
    pair_id_unordered,
    contains("Stim2"),
    contains("_PW")
  )

PW_ratings_com <- PW_ratings_1 %>%
  left_join(
    PW_ratings_2,
    by = c(
      "workerId",
      "Stimuli1_anon",
      "Stimuli2_anon",
      "enteredResponse",
      "Stimuli_anon",
      "scaledResponse",
      "ReScaledResponse",
      "pair_id_ordered",
      "pair_id_unordered",
      "count"
    )
  ) %>%
  ungroup()

# Create factor variables.
PW_ratings_com$workerId <- as.factor(PW_ratings_com$workerId)
PW_ratings_com$pair_id_unordered <- as.factor(PW_ratings_com$pair_id_unordered)
PW_ratings_com$Stimuli1_anon <- as.factor(PW_ratings_com$Stimuli1_anon)
PW_ratings_com$Stimuli2_anon <- as.factor(PW_ratings_com$Stimuli2_anon)
PW_ratings_com$pair_id_ordered <- as.factor(PW_ratings_com$pair_id_ordered)

The next code chunk fits the GAMM used to predict the pairwise similarity ratings. The model structure consists of four ‘tensor smooths’ each of which represents an two-way interaction between stimulus 1 and stimulus 2 in terms of articulation rate, pitch, leader-lagger score, and back vowel score.

Click here to view code.
gamm_fit <-
  bam(
    ReScaledResponse ~
      te(Stim1_ArticRate, Stim2_ArticRate, k = 4) +
      te(Stim1_MeanPitch, Stim2_MeanPitch, k = 4) +
      te(Stim1_LeaderLaggerScore, Stim2_LeaderLaggerScore, k = 4) +
      te(Stim1_BackVowelScore, Stim2_BackVowelScore, k = 4) +
      s(pair_id_ordered, bs = "re") +
      s(workerId, bs = "re"),
    discrete = T,
    nthreads = 4,
    data = PW_ratings_com
  )

5.3.2 Model summary

The summary below presents the model output while the tabs in Section 5.3.3 present residual and quantile plots. The relationships between first and second stimulus articulation rate, pitch and the leader-lagger scores significant. The relationship between the first and second stimuli back-vowel configuration scores does not significantly predict similarity ratings.

Click here to view code.
summary(gamm_fit, re.test = F)

Family: gaussian 
Link function: identity 

Formula:
ReScaledResponse ~ te(Stim1_ArticRate, Stim2_ArticRate, k = 4) + 
    te(Stim1_MeanPitch, Stim2_MeanPitch, k = 4) + te(Stim1_LeaderLaggerScore, 
    Stim2_LeaderLaggerScore, k = 4) + te(Stim1_BackVowelScore, 
    Stim2_BackVowelScore, k = 4) + s(pair_id_ordered, bs = "re") + 
    s(workerId, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.61643    0.01664   217.4   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                                                      edf Ref.df     F  p-value
te(Stim1_ArticRate,Stim2_ArticRate)                 9.518 10.476 6.254  < 2e-16
te(Stim1_MeanPitch,Stim2_MeanPitch)                 8.056  9.224 6.232  < 2e-16
te(Stim1_LeaderLaggerScore,Stim2_LeaderLaggerScore) 4.726  5.151 6.172 8.37e-06
te(Stim1_BackVowelScore,Stim2_BackVowelScore)       4.800  5.538 0.646    0.688
                                                       
te(Stim1_ArticRate,Stim2_ArticRate)                 ***
te(Stim1_MeanPitch,Stim2_MeanPitch)                 ***
te(Stim1_LeaderLaggerScore,Stim2_LeaderLaggerScore) ***
te(Stim1_BackVowelScore,Stim2_BackVowelScore)          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.184   Deviance explained = 27.1%
fREML = 6956.7  Scale est. = 0.79473   n = 5054

5.3.3 Model diagnostics

The tabs below contain model diagnostic plots from the fitted model.

Click here to view code.
# creates normal quantile plots for model residuals
appraise(gamm_fit)

Plotted model residuals from fitted GAMM

Click here to view code.
# Creates contour plots for each interaction term and a quantile plot for random
# effect in the model
plot(gamm_fit)

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

5.3.4 Model predictions

We can now extract the model estimates from our model in the code chunk below, and plot the significant tensor smooths alongside the individual rated pairs in the tabs below. The horizontal axes correspond to the relevant measure for the first stimulus in a pair (speed in Figure 13, pitch in Figure 14 and leader-lagger scores in Figure 15 ). The vertical axes correspond to the same measure for the second stimulus in a pair. A light value then corresponds to a higher estimated similarity rating, a dark value to a lower estimated similarity rating.

Click here to view code.
gamm_estimates <- smooth_estimates(gamm_fit) %>%
  add_confint()
Click here to view code.
gamm_estimates %>%
  filter(
    .smooth %in% c("te(Stim1_ArticRate,Stim2_ArticRate)"),
    .lower_ci > 0 | .upper_ci < 0
  ) %>%
  ggplot() +
  geom_contour_filled(aes(
    x = Stim1_ArticRate,
    y = Stim2_ArticRate,
    z = .estimate
  )) +
  geom_point(
    data = PW_ratings_com,
    aes(x = Stim1_ArticRate, y = Stim2_ArticRate),
    alpha = 0.05,
    colour = "black"
  ) +
  labs(
    x = "Stim 1 Articulation rate",
    y = "Stim 2 Articulation rate",
    fill = "Similarity Rating"
  )

Figure 13: Model predictions of perceived similarity according to stimuli speed

Click here to view code.
gamm_estimates %>%
  filter(
    .smooth %in% c("te(Stim1_MeanPitch,Stim2_MeanPitch)"),
    .lower_ci > 0 | .upper_ci < 0
  ) %>%
  ggplot() +
  geom_contour_filled(aes(
    x = Stim1_MeanPitch,
    y = Stim2_MeanPitch,
    z = .estimate
  )) +
  geom_point(
    data = PW_ratings_com,
    aes(x = Stim1_MeanPitch, y = Stim2_MeanPitch),
    alpha = 0.05,
    colour = "black"
  ) +
  labs(
    x = "Stim 1 Mean pitch",
    y = "Stim 2 Mean pitch",
    fill = "Similarity Rating"
  )

Figure 14: Model predictions of perceived similarity according to stimuli pitch

Click here to view code.
gamm_estimates %>%
  filter(
    .smooth %in% c("te(Stim1_LeaderLaggerScore,Stim2_LeaderLaggerScore)"),
    .lower_ci > 0 | .upper_ci < 0
  ) %>%
  ggplot() +
  geom_contour_filled(aes(
    x = Stim1_LeaderLaggerScore,
    y = Stim2_LeaderLaggerScore,
    z = .estimate
  )) +
  geom_point(
    data = PW_ratings_com,
    aes(x = Stim1_LeaderLaggerScore, y = Stim2_LeaderLaggerScore),
    alpha = 0.05,
    colour = "black"
  ) +
  labs(
    x = "Stim 1 Leader-lagger score", y = "Stim 2 Leader-lagger score", fill =
      "Similarity Rating"
  )

Figure 15: Model predictions of perceived similarity according to stimuli leader-lagger scores

The following code chunk save the above figures as Figure 3A, 3B and 4 from the manuscript.

Click here to view code.
figure_3 <- gamm_estimates %>%
  filter(
    .smooth %in% c("te(Stim1_ArticRate,Stim2_ArticRate)"),
    .lower_ci > 0 | .upper_ci < 0
  ) %>%
  ggplot() +
  geom_contour_filled(aes(
    x = Stim1_ArticRate,
    y = Stim2_ArticRate,
    z = .estimate
  )) +
  geom_point(
    data = PW_ratings_com,
    aes(x = Stim1_ArticRate, y = Stim2_ArticRate),
    alpha = 0.05,
    colour = "black"
  ) +
  labs(
    x = "Stim 1 Articulation rate",
    y = "Stim 2 Articulation rate",
    fill = "Similarity Rating",
    title = "A"
  ) +
  gamm_estimates %>%
  filter(
    .smooth %in% c("te(Stim1_MeanPitch,Stim2_MeanPitch)"),
    .lower_ci > 0 | .upper_ci < 0
  ) %>%
  ggplot() +
  geom_contour_filled(aes(
    x = Stim1_MeanPitch,
    y = Stim2_MeanPitch,
    z = .estimate
  )) +
  geom_point(
    data = PW_ratings_com,
    aes(x = Stim1_MeanPitch, y = Stim2_MeanPitch),
    alpha = 0.05,
    colour = "black"
  ) +
  labs(
    x = "Stim 1 Mean pitch",
    y = "Stim 2 Mean pitch",
    fill = "Similarity Rating",
    title = "B"
  )

# Save figure 3
ggsave(
  figure_3,
  path = here("Figures"),
  dpi = 300,
  filename = "GAMMOutput_SpeedPitch.png",
  width = 9,
  height = 3.5
)

figure_4 <- gamm_estimates %>%
  filter(
    .smooth %in% c("te(Stim1_LeaderLaggerScore,Stim2_LeaderLaggerScore)"),
    .lower_ci > 0 | .upper_ci < 0
  ) %>%
  ggplot() +
  geom_contour_filled(aes(
    x = Stim1_LeaderLaggerScore,
    y = Stim2_LeaderLaggerScore,
    z = .estimate
  )) +
  geom_point(
    data = PW_ratings_com,
    aes(x = Stim1_LeaderLaggerScore, y = Stim2_LeaderLaggerScore),
    alpha = 0.05,
    colour = "black"
  ) +
  labs(
    x = "Stim 1 Leader-lagger score", y = "Stim 2 Leader-lagger score", fill =
      "Similarity Rating"
  )

# Save figure 4
ggsave(
  figure_4,
  path = here("Figures"),
  dpi = 300,
  filename = "GAMMOutput_PC1.png",
  width = 4.5,
  height = 3
)

6 Testing pre-registered correlations

We pre-registered that we would use pairwise correlations to investigate the acoustic correlates of Dimension 1 and Dimension 2. As such, Figure 16 shows a summary of the correlations between Dimensions 1 and 2 and speakers’ leader-lagger scores, their BVC scores, their articulation rate and their mean pitch. We have also included a measure of creak in these correlations, as this was mentioned in the pre-registration.

The results of the regression trees and random forests are upheld in that leader-lagger scores and mean pitch significantly correlate with Dimension 1 scores, and articulation rate and mean pitch significantly correlate with Dimension 2 scores. Consistent with the random forest results, back vowel scores also significantly correlate with Dimension 2 scores, albeit to a lesser extent than speed and pitch.

Click here to view code.
columns <-
  c(
    "LeaderLaggerScore",
    "BackVowelScore",
    "ArticRate",
    "MeanPitch",
    "creak_prop",
    "D1_PW",
    "D2_PW"
  )

correlogram1 <- measures_df %>%
  as_tibble() %>%
  select(all_of(columns))

correlogram_function(correlogram1)

Figure 16: Summary correlogram showing pre-registered correlations. Values in the lower diagonal represent correlation co-efficients, while values in the upper diagonal represent p-values.

However, unlike the regression trees and random forest, these pairwise correlations do not give us a sense of how the different variables relate to one another in differentiating between speakers in the perceptual space (e.g., once high pitch speakers are accounted for, leader-lagger scores correlate more strongly with Dimension 1).

7 Data exclusions

In our pre-registration, we stated that we would “exclude participants whose variation in the rating scale is more than 2.5 standard deviations above or below the mean range of rating scores (i.e., participants who do not vary in their ratings at all or who alternate only between the extreme values).”

However, this methodology failed to identify listeners who alternated only between extreme values, despite the presence of participants who used only 0 and 1 values in the data. To fulfil our underlying goal of removing minimally and maximally variable participants, we instead excluded participants based on two criteria. First, participants who used a range smaller than 0.25 of rating scale (i.e., all 40 ratings were within the same quarter of the scale) were excluded. Second, participants who used the extremes of the rating scale (i.e., they rated stimuli as 1 or 0) for 34 or more stimuli were also excluded. The distribution of participants’ ratings shows that almost all participants utilised a range of values and did not rely solely on the ends of the scale, marking those who did as outliers. The two criteria resulted in us excluding only seven participants from the analysis.

This section explains in more detail the rationale for excluding the seven participants who were not included in the reported analysis. Figure 17 plots the overall distribution of similarity ratings in a histogram using 10 bins (i.e., an 11-point scale between 0 and 1). We can see that participants overall use the full scale, but generally favour the lower end of the scale over the higher end.

Click here to view code.
example <- PW_ratings_unfiltered %>%
  filter(enteredResponse != "no-response") %>%
  # Convert to numeric
  mutate(enteredResponse = as.numeric(enteredResponse)) %>%
  select(
    workerId,
    Stimuli_anon,
    Stimuli1_anon,
    Stimuli2_anon,
    enteredResponse,
  )

example |>
  ggplot(aes(x = enteredResponse)) +
  geom_histogram(bins = 11)

Figure 17: Distribution of unfiltered and unscaled participant similarity ratings

Calculating whether a participant’s range of used values was 2.5 SD deviations above or below the mean range only identified outliers who varied minimally in their ratings, and no speakers who alternated between extreme values, despite there being participants who (almost) exclusively used 0 and 1 values. We therefore opted to exclude participants based on whether they use a range smaller than 0.25 (1/4) of the scale, and whether 34 or more of their ratings were a 0 or a 1. The 34 cutoff is arbitrary, but it captures a number of participants who effectively used 0 or 1 exclusively, with very minor variation at the extreme of the scale. Specifically, of the three excluded participants who did not exclusively use 1 or 0, all used only 1 or 0 ratings except for: one rating of 0.015, two scores of 0.034 and 0.975, and four scores greater than 0.9, respectively.

As Figure 18 shows, exclusively using the extremes of the rating scale is not consistent with the behaviour of most participants. Figure 18 displays the distribution of similarity ratings for 12 random participants, and for the 7 participants excluded under our criteria of using either a) less than a quarter of the rating scale or b) using only 0 or 1 for 34 or more of their 40 ratings. We can see that participant 294 did not use the range of the rating scale at all, staying at the midpoint for their ratings. The other participants are concentrated almost exclusively at the extremes of scale, in contrast to the randomly selected participants.

Click here to view code.
set.seed(3)

worker_summary <- example %>%
  group_by(workerId) %>%
  summarise(
    range_scale = max(enteredResponse) - min(enteredResponse),
    count_0 = sum(enteredResponse == 0),
    count_1 = sum(enteredResponse == 1),
    responses = n()
  )

to_remove <- worker_summary %>%
  mutate(extreme_count = count_0 + count_1) %>%
  filter(
    range_scale < 0.25 | extreme_count >= 34
  ) %>%
  pull(workerId)

random_participants <- example %>%
  filter(!workerId %in% to_remove)

random_participants <- sample(random_participants$workerId, 12)

example |>
  filter(workerId %in% random_participants) %>%
  ggplot(aes(
    x = enteredResponse,
    fill = workerId
  )) +
  geom_histogram(
    alpha = 0.4,
    linewidth = 1,
    bins = 11,
    colour = "black"
  ) +
  scale_fill_brewer(palette = "Set3") +
  labs(title = "12 Random participants") +
  theme(legend.position = "none") +
  example |>
  filter(workerId %in% to_remove) %>%
  ggplot(aes(
    x = enteredResponse,
    fill = workerId
  )) +
  geom_histogram(
    alpha = 0.4,
    linewidth = 1,
    bins = 11,
    colour = "black"
  ) +
  labs(title = "7 excluded participants") +
  scale_fill_brewer(palette = "Dark2")

Figure 18: Distribution of participant similarity ratings from excluded participants and a random selection of included participants

In our pre-registration, we also stated that we would filter outliers based on whether they were 2.5 SDs below the mean time taken to complete the task. This criterion was never satisfied, so no one was removed.

8 Comparing across reported and pre-registered results

There remains, nonetheless, a reasonable question about whether changing our filtering criteria meaningfully changes the results of the analysis. As such, this section applies an MDS analysis, as applied to the reported data, to a dissimilarity matrix generated from the ratings data that only has participants removed as per the preregistration, and tests correlations between the Dimension 1 and Dimension 2 scores from this and the reported analysis. The section also applies the same GAMM formula to the pairwise ratings with only the pre-registered filtering applied.

As will become clear from the the following sections, the different filtering criteria does not meaningfully change the results of the analysis. The same structures underly both MDS analyses, and the same predictors emerge as significant in the GAMM.

8.1 Comparing MDS (Spline) dimensions

As with the reported MDS, Figure 19 indicates we need at least one dimension, and that we aren’t including too few dimensions if we run a two-dimensional MDS on this data. We therefore again run an spline MDS with two dimensions.

Click here to view code.
set.seed(10)
prereg_test <- mds_test(
  df_prereg,
  n_boots = 100,
  n_perms = 100,
  test_dimensions = 5,
  mds_type = "mspline",
  spline_degree = 3,
  spline_int_knots = 5
)

plot_mds_test(prereg_test) +
  labs(y = "Reduction in Stress", x = "Number of Dimensions in MDS Analysis", colour = "Data")

Figure 19: Boxplots depict stress reduction as additional dimensions are added for bootstrapped samples (red) and permuted samples (blue). Stress reduction in the experimental data is depicted by black crosses.

The next code chunk again selects the the two-dimensional MDS analysis with the best random start from 100 random starts.

Click here to view code.
set.seed(200)
fit_df_prereg <- NULL
for (i in 1:100) {
  fit_df_prereg[[i]] <- smacofSym(
    df_prereg,
    ndim = 2,
    type = "mspline",
    principal = T,
    init = "random",
    spline.degree = 3,
    spline.intKnots = 5,
    itmax = 2000
  )
}
ind <- which.min(sapply(fit_df_prereg, function(x) {
  x$stress
}))
fit_df_prereg <- fit_df_prereg[[ind]]
fit_df_prereg

Call:
smacofSym(delta = df_prereg, ndim = 2, type = "mspline", init = "random", 
    principal = T, itmax = 2000, spline.degree = 3, spline.intKnots = 5)

Model: Symmetric SMACOF 
Number of objects: 38 
Stress-1 value: 0.313 
Number of iterations: 278 
Click here to view code.
conf_PW_prereg <- fit_df_prereg$conf
dimensions_PW_prereg <- as.data.frame(conf_PW_prereg) %>%
  rename(D1_PW_PR = V1, D2_PW_PR = V2) %>%
  rownames_to_column(var = "Speaker")

The most direct way to assess whether the two MDS analyses have the same underlying structure is to test correlations between speakers’ D1 and D2 scores from each analysis. However, it is the relative distance between stimuli that is important for the interpretation of an MDS space. The position of stimuli within the space is otherwise arbitrary; while it is possible that stimuli D1 or D2 scores from separate MDS analyses with a similar underlying structure would directly align with one another, this is not guaranteed (e.g., D1 could capture speed and pitch and D2 primarily the leader-lagger vowels, which would result in no/low correlations for either dimension).

As such, to make the two MDS analyses directly comparable, the next code chunk uses the procrustes() function from the vegan package to rotate the D1/D2 stimuli scores so that the MDS space is maximally similar to that of the reported MDS space.

Click here to view code.
# Apply procrustes rotation
fit_PW_rotated <-
  procrustes(fit_df_ID$conf, fit_df_prereg$conf, scores = "sites")

# Extract rotated scores and rename variables
rotated_PW <- fit_PW_rotated$Yrot |>
  as_tibble(.name_repair = "unique") |>
  rename(
    Rotated_D1_PR = `...1`,
    Rotated_D2_PR = `...2`
  )

# Join rotated scores with other variables
dimensions_PW_PR <- bind_cols(dimensions_PW_prereg, rotated_PW)

measures_df <- measures_df %>%
  right_join(dimensions_PW_PR, by = c("SpeakerId" = "Speaker"))

8.2 Testing correlations between reported and pre-registered MDS scores

Figure 20 tests Spearman correlations between the reported D1/D2 scores, and the rotated D1/D2 scores from the MDS applied to the pre-registered data. The correlations indicate that the underlying structure of the perceptual similarity space is stable across the reported and pre-registered data; D1 scores have a very strong, significant positive correlation (r = .92, p < 0.05), as do D2 scores (r = .8, p < 0.05). It is also worth noting that, while not significant, the rotated D1 scores correlate slightly with the reported D2 scores, and the rotated D2 scores with the reported D1 scores, indicating that there may still be some structure that is similar but captured differently across the two dimensions from each analysis.

Click here to view code.
columns <-
  c(
    "D1_PW",
    "D2_PW",
    "Rotated_D1_PR",
    "Rotated_D2_PR"
  )

correlogram2 <- measures_df %>%
  as_tibble() %>%
  select(all_of(columns))

correlogram_function(correlogram2)

Figure 20: Correlogram showing correlations between Dimensions 1 and 2 from the reported and pre-registered analyses. Values in the lower diagonal represent correlation co-efficients, while values in the upper diagonal represent p-values.”

The consistency of the underlying structure is supported by the comparison in Figure 21. Across both spaces, similar speakers are grouped together and further apart.3

Click here to view code.
measures_df %>%
  ggplot(aes(x = D1_PW, y = D2_PW)) +
  geom_label(aes(label = SpeakerId), size = 4) +
  coord_fixed() +
  labs(x = "Dimension 1", y = "Dimension 2", title = "Reported MDS space") +
  measures_df %>%
  ggplot(aes(x = Rotated_D1_PR, y = Rotated_D2_PR)) +
  geom_label(aes(label = SpeakerId), size = 4) +
  coord_fixed() +
  labs(x = "Dimension 1", y = "Dimension 2", title = "Preregistered MDS space")

Figure 21: Reported MDS space compared with the MDS space generated from pre-registered data

8.3 Comparing GAMM results between reported and preregistered data

In this section we apply the same GAMM formula as Section 5.3 to the data filtered based on the preregistration.

8.3.1 Data wrangling

The code chunk below again creates a data frame with unique rows for the speed, pitch, leader-lagger score and back-vowel configuration score for the first and second stimulus in each stimuli pair, but this time for the pre-registered data frame.

Click here to view code.
PW_ratings_prereg <- PW_ratings_prereg %>%
  mutate(
    Stimuli1ID = as.character(Stimuli1ID), Stimuli2ID = as.character(Stimuli2ID),
    count = as.character(count)
  ) %>%
  ungroup()

PW_ratings_filtered_1 <- PW_ratings_prereg %>%
  left_join(measures_df, by = c("Stimuli1ID" = "SpeakerId")) %>%
  select(all_of(variables)) %>%
  rename_with(~ paste("Stim1", .x, sep = "_"), contains(
    c(
      "artic",
      "pitch",
      "_PW",
      "Score"
    )
  )) %>%
  select(
    all_of(variables2),
    pair_id_unordered,
    contains("Stim1"),
    contains("_PW")
  )


PW_ratings_filtered_2 <- PW_ratings_prereg %>%
  left_join(measures_df, by = c("Stimuli2ID" = "SpeakerId")) %>%
  select(all_of(variables)) %>%
  rename_with(~ paste("Stim2", .x, sep = "_"), contains(c(
    "artic",
    "pitch",
    "_PW",
    "Score"
  ))) %>%
  select(
    all_of(variables2),
    pair_id_unordered,
    contains("Stim2"),
    contains("_PW"),
  )

PW_ratings_filtered_com <- PW_ratings_filtered_1 %>%
  left_join(
    PW_ratings_filtered_2,
    by = c(
      "workerId",
      "Stimuli1_anon",
      "Stimuli2_anon",
      "enteredResponse",
      "Stimuli_anon",
      "scaledResponse",
      "ReScaledResponse",
      "pair_id_ordered",
      "pair_id_unordered",
      "count"
    )
  ) %>%
  ungroup()

PW_ratings_filtered_com$workerId <- as.factor(PW_ratings_filtered_com$workerId)
PW_ratings_filtered_com$pair_id_unordered <- as.factor(PW_ratings_filtered_com$pair_id_unordered)
PW_ratings_filtered_com$Stimuli1_anon <- as.factor(PW_ratings_filtered_com$Stimuli1_anon)
PW_ratings_filtered_com$Stimuli2_anon <- as.factor(PW_ratings_filtered_com$Stimuli2_anon)
PW_ratings_filtered_com$pair_id_ordered <- as.factor(PW_ratings_filtered_com$pair_id_ordered)

The summary below presents the model output while the tabs in Section 8.3.2 present residual and quantile plots. As in the reported GAMM, the relationships between first and second stimulus articulation rate, pitch and the leader-lagger scores are significant. Also as in the reported GAMM, the relationship between the first and second stimuli back-vowel configuration scores does not significantly predict similarity ratings.

Click here to view code.
gamm_fit_prereg <-
  bam(
    ReScaledResponse ~
      te(Stim1_ArticRate,
        Stim2_ArticRate,
        k = 4
      ) +
      te(Stim1_MeanPitch,
        Stim2_MeanPitch,
        k = 4
      ) +
      te(Stim1_LeaderLaggerScore,
        Stim2_LeaderLaggerScore,
        k = 4
      ) +
      te(Stim1_BackVowelScore,
        Stim2_BackVowelScore,
        k = 4
      ) +
      s(pair_id_ordered, bs = "re") +
      s(workerId, bs = "re"),
    discrete = T,
    nthreads = 4,
    data = PW_ratings_filtered_com
  )
Click here to view code.
summary(gamm_fit_prereg, re.test = F)

Family: gaussian 
Link function: identity 

Formula:
ReScaledResponse ~ te(Stim1_ArticRate, Stim2_ArticRate, k = 4) + 
    te(Stim1_MeanPitch, Stim2_MeanPitch, k = 4) + te(Stim1_LeaderLaggerScore, 
    Stim2_LeaderLaggerScore, k = 4) + te(Stim1_BackVowelScore, 
    Stim2_BackVowelScore, k = 4) + s(pair_id_ordered, bs = "re") + 
    s(workerId, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.9899     0.0164   365.3   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                                                      edf Ref.df     F  p-value
te(Stim1_ArticRate,Stim2_ArticRate)                 9.483 10.467 5.946  < 2e-16
te(Stim1_MeanPitch,Stim2_MeanPitch)                 7.928  9.114 6.021  < 2e-16
te(Stim1_LeaderLaggerScore,Stim2_LeaderLaggerScore) 4.680  5.108 6.641 3.15e-06
te(Stim1_BackVowelScore,Stim2_BackVowelScore)       5.622  6.564 0.953    0.455
                                                       
te(Stim1_ArticRate,Stim2_ArticRate)                 ***
te(Stim1_MeanPitch,Stim2_MeanPitch)                 ***
te(Stim1_LeaderLaggerScore,Stim2_LeaderLaggerScore) ***
te(Stim1_BackVowelScore,Stim2_BackVowelScore)          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.177   Deviance explained = 26.1%
fREML = 7120.5  Scale est. = 0.80193   n = 5168

8.3.2 Model diagnostics

While the overall result of the model is stable, it is worth noting that there are outliers in the model residuals here that were not in those of the reported model (See Figure 22, especially the QQ plot). This provides further evidence that there were outliers that the pre-registered filtering failed to identify, but that our adjusted filtering did.

Click here to view code.
# creates normal quantile plots for model residuals

appraise(gamm_fit_prereg)

Figure 22: Plotted model residuls from fitted GAMM

Click here to view code.
plot(gamm_fit_prereg)

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

Contour plots for each interaction term and a quantile plot for the random effect in the fitted GAMM

8.3.3 Model predictions

We can now extract the model estimates in the code chunk below, and plot the significant tensor smooths alongside the individual rated pairs in Figure 23, Figure 24, and Figure 25

We see effects of first and second stimuli speed, pitch, and the leader-lagger-lagger continuum in predicting perceived similarity that are directly comparable to the effects in displayed Figure 13, Figure 14, and Figure 15.

We conclude that our deviation from our preregistration has not materially affected our reported results.

Click here to view code.
# We extract estimates to plot
estimates_prereg <- smooth_estimates(gamm_fit_prereg) %>%
  add_confint()
Click here to view code.
estimates_prereg %>%
  filter(
    .smooth %in% c("te(Stim1_ArticRate,Stim2_ArticRate)"),
    .lower_ci > 0 | .upper_ci < 0
  ) %>%
  ggplot() +
  geom_contour_filled(aes(
    x = Stim1_ArticRate,
    y = Stim2_ArticRate,
    z = .estimate
  )) +
  geom_point(
    data = PW_ratings_com,
    aes(x = Stim1_ArticRate, y = Stim2_ArticRate),
    alpha = 0.05,
    colour = "black"
  ) +
  labs(
    x = "Stim 1 Articulation rate",
    y = "Stim 2 Articulation rate",
    fill = "Similarity Rating"
  )

Figure 23: Model predictions of perceived similarity according to stimuli speed (pregistered data filtering)

Click here to view code.
estimates_prereg %>%
  filter(
    .smooth %in% c("te(Stim1_MeanPitch,Stim2_MeanPitch)"),
    .lower_ci > 0 | .upper_ci < 0
  ) %>%
  ggplot() +
  geom_contour_filled(aes(
    x = Stim1_MeanPitch,
    y = Stim2_MeanPitch,
    z = .estimate
  )) +
  geom_point(
    data = PW_ratings_com,
    aes(x = Stim1_MeanPitch, y = Stim2_MeanPitch),
    alpha = 0.05,
    colour = "black"
  ) +
  labs(
    x = "Stim 1 Mean pitch",
    y = "Stim 2 Mean pitch",
    fill = "Similarity Rating"
  )

Figure 24: Model predictions of perceived similarity according to stimuli pitch (pregistered data filtering)

Click here to view code.
estimates_prereg %>%
  filter(
    .smooth %in% c("te(Stim1_LeaderLaggerScore,Stim2_LeaderLaggerScore)"),
    .lower_ci > 0 | .upper_ci < 0
  ) %>%
  ggplot() +
  geom_contour_filled(aes(
    x = Stim1_LeaderLaggerScore,
    y = Stim2_LeaderLaggerScore,
    z = .estimate
  )) +
  geom_point(
    data = PW_ratings_com,
    aes(x = Stim1_LeaderLaggerScore, y = Stim2_LeaderLaggerScore),
    alpha = 0.05,
    colour = "black"
  ) +
  labs(
    x = "Stim 1 Leader-lagger score", y = "Stim 2 Leader-lagger score", fill =
      "Similarity Rating"
  )

Figure 25: Model predictions of perceived similarity according to stimuli leader-lagger scores (pregistered data filtering)

9 Packages Used

We used R version 4.2.2 (R Core Team 2022) and the following R packages: colorspace v. 2.1.0 (Zeileis, Hornik, and Murrell 2009; Stauffer et al. 2009; Zeileis et al. 2020), corrplot v. 0.92 (Wei and Simko 2021), cowplot v. 1.1.3 (Wilke 2024), dials v. 1.2.1 (Kuhn and Frick 2024), e1071 v. 1.7.14 (Meyer et al. 2023), ggalt v. 0.4.0 (Rudis, Bolker, and Schulz 2017), ggcorrplot v. 0.1.4.1 (Kassambara 2023), gratia v. 0.9.0 (Simpson 2024), here v. 1.0.1 (Müller 2020), infer v. 1.0.7 (Couch et al. 2021), kableExtra v. 1.4.0 (Zhu 2024), knitr v. 1.46 (Xie 2014, 2015, 2024), lattice v. 0.22.6 (Sarkar 2008), magick v. 2.8.5 (Ooms 2024), mgcv v. 1.9.1 (S. N. Wood 2003, 2004, 2011; S. N. Wood et al. 2016; S. N. Wood 2017), modeldata v. 1.3.0 (Kuhn 2024a), nlme v. 3.1.164 (J. C. Pinheiro and Bates 2000; J. Pinheiro, Bates, and R Core Team 2023), nzilbb.vowels v. 0.3.1 (Wilson Black et al. 2023), parsnip v. 1.2.1 (Kuhn and Vaughan 2024), patchwork v. 1.2.0 (Pedersen 2024), permute v. 0.9.7 (Simpson 2022), plotrix v. 3.8.4 (J 2006), randomForest v. 4.7.1.1 (Liaw and Wiener 2002), recipes v. 1.0.10 (Kuhn, Wickham, and Hvitfeldt 2024), rpart v. 4.1.23 (Therneau and Atkinson 2023), rpart.plot v. 3.1.2 (Milborrow 2024), rsample v. 1.2.1 (Frick et al. 2024), scales v. 1.3.0 (Wickham, Pedersen, and Seidel 2023), smacof v. 2.1.6 (de Leeuw and Mair 2009; Mair, Groenen, and de Leeuw 2022), tidymodels v. 1.2.0 (Kuhn and Wickham 2020), tidyverse v. 2.0.0 (Wickham et al. 2019), tune v. 1.2.1 (Kuhn 2024b), vegan v. 2.6.4 (Oksanen et al. 2022), vip v. 0.4.1 (Greenwell and Boehmke 2020), workflows v. 1.1.4 (Vaughan and Couch 2024), workflowsets v. 1.1.0 (Kuhn and Couch 2024), yardstick v. 1.3.1 (Kuhn, Vaughan, and Hvitfeldt 2024).

References

Baer, Ruth A., James Ballenger, David T. T. Berry, and Martha W. Wetter. 1997. “Detection of Random Responding on the MMPI-a.” Journal Article. Journal of Personality Assessment 68 (1). https://doi.org/https://doi.org/10.1207/s15327752jpa6801_11.
Berry, David T. T., Martha W. Wetter, Ruth A. Baer, Lene Larsen, Cynthia Clark, and Keith Monroe. 1992. “MMPI-2 Random Responding Indices: Balidation Using a Self-Report Methodology.” Journal Article. Psychological Assessment 4: 340–45. https://doi.org/http://dx.doi.org/10.1037/1040-3590.4.3.340.
Boersma, Paul. 2001. “Praat, a System for Doing Phonetics by Computer.” Journal Article. Glot International 5 (9/10): 341–45.
Brand, James, Jen Hay, Lynn Clark, Kevin Watson, and Márton Sóskuthy. 2021. “Systematic Co-Variation of Monophthongs Across Speakers of New Zealand English.” Journal Article. Journal of Phonetics 88: 101096. https://doi.org/https://doi.org/10.1016/j.wocn.2021.101096.
Clopper, Cynthia G, and Ann R Bradlow. 2009. “Free Classification of American English Dialects by Native and Non-Native Listeners.” Journal Article. Journal of Phonetics 37 (4): 436–51.
Clopper, Cynthia G, and David B Pisoni. 2007. “Free Classification of Regional Dialects of American English.” Journal Article. Journal of Phonetics 35 (3): 421–38.
Couch, Simon P., Andrew P. Bray, Chester Ismay, Evgeni Chasnovski, Benjamin S. Baumer, and Mine Çetinkaya-Rundel. 2021. infer: An R Package for Tidyverse-Friendly Statistical Inference.” Journal of Open Source Software 6 (65): 3661. https://doi.org/10.21105/joss.03661.
de Leeuw, Jan, and Patrick Mair. 2009. “Multidimensional Scaling Using Majorization: SMACOF in r.” Journal of Statistical Software 31. https://doi.org/10.18637/jss.v031.i03.
Frick, Hannah, Fanny Chow, Max Kuhn, Michael Mahoney, Julia Silge, and Hadley Wickham. 2024. rsample: General Resampling Infrastructure. https://CRAN.R-project.org/package=rsample.
Galesic, Mirta, and Michael Bosnjak. 2009. “Effects of Questionnaire Length on Participation and Indicators of Response Quality in a Web Survey.” Journal Article. Public Opinion Quarterly 73 (2): 349–60. https://doi.org/https://doi.org/10.1093/poq/nfp031.
Greenwell, Brandon M., and Bradley C. Boehmke. 2020. “Variable Importance Plots—an Introduction to the Vip Package.” The R Journal 12 (1): 343–66. https://doi.org/10.32614/RJ-2020-013.
Hurring, Gia, Josh Wilson Black, Jen Hay, and Lynn Clark. Accepted. “How Stable Are Patterns of Covariation Across Time?” Journal Article. Language Variation and Change, Accepted.
J, Lemon. 2006. Plotrix: A Package in the Red Light District of r.” R-News 6 (4): 8–12.
Kassambara, Alboukadel. 2023. ggcorrplot: Visualization of a Correlation Matrix Using ggplot2. https://CRAN.R-project.org/package=ggcorrplot.
Kuhn, Max. 2024a. modeldata: Data Sets Useful for Modeling Examples. https://CRAN.R-project.org/package=modeldata.
———. 2024b. tune: Tidy Tuning Tools. https://CRAN.R-project.org/package=tune.
Kuhn, Max, and Simon Couch. 2024. workflowsets: Create a Collection of tidymodels Workflows. https://CRAN.R-project.org/package=workflowsets.
Kuhn, Max, and Hannah Frick. 2024. dials: Tools for Creating Tuning Parameter Values. https://CRAN.R-project.org/package=dials.
Kuhn, Max, and Davis Vaughan. 2024. parsnip: A Common API to Modeling and Analysis Functions. https://CRAN.R-project.org/package=parsnip.
Kuhn, Max, Davis Vaughan, and Emil Hvitfeldt. 2024. yardstick: Tidy Characterizations of Model Performance. https://CRAN.R-project.org/package=yardstick.
Kuhn, Max, and Davis Vaughn. 2024. “Parsnip: A Common API to Modeling and Analysis Functions.” Computer Program. https://github.com/tidymodels/parsnip. .
Kuhn, Max, and Hadley Wickham. 2020. Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles. https://www.tidymodels.org.
Kuhn, Max, Hadley Wickham, and Emil Hvitfeldt. 2024. recipes: Preprocessing and Feature Engineering Steps for Modeling. https://CRAN.R-project.org/package=recipes.
Liaw, Andy, and Matthew Wiener. 2002. “Classification and Regression by randomForest.” R News 2 (3): 18–22. https://CRAN.R-project.org/doc/Rnews/.
Mair, Patrick, Ingwer Borg, and Thomas Rusch. 2016. “Goodness-of-Fit Assessment in Multidimensional Scaling and Unfolding.” Journal Article. Multivariate Behavioural Research 51 (6): 772–89. https://doi.org/https://doi.org/10.1080/00273171.2016.1235966.
Mair, Patrick, Patrick J. F. Groenen, and Josh de Leeuw. 2022. “More on Multidimensional Scaling and Unfolding in r: Smacof Version 2.” Journal Article. Journal of Statistical Software 102 (10): 1–46. https://doi.org/https://doi.org/10.18637/jss.v102.i10.
Mair, Patrick, Patrick Groenen, and Jan de Leeuw. 2022. “Multidimensional Scaling Using Majorization: SMACOF in r.” Journal of Statistical Software 102. https://doi.org/10.18637/jss.v102.i10.
McDougall, Kirsty. 2013. “Assessing Perceived Voice Similarity Using Multidimensional Scaling for the Construction of Voice Parades.” Journal Article. The International Journal of Speech, Language and the Law 20 (2): 163–72. https://doi.org/https://doi.org/10.1558/ijsll.v20i2.163.
Meyer, David, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel, and Friedrich Leisch. 2023. E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. https://CRAN.R-project.org/package=e1071.
Milborrow, Stephen. 2024. rpart.plot: Plot rpart Models: An Enhanced Version of plot.rpart. https://CRAN.R-project.org/package=rpart.plot.
Müller, Kirill. 2020. here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.
Nolan, Francis, Kirsty McDougall, and Toby Hudson. 2011. “Some Acoustic Correlates of Perceived (Dis)similarity Between Same-Accent Voices.” Book Section. In Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS XVII): August 17-21, edited by Wai-Sum Lee and Eric Zee, 1506–9. Hong Kong: City University of Hong Kong.
Oksanen, Jari, Gavin L. Simpson, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R. Minchin, R. B. O’Hara, et al. 2022. vegan: Community Ecology Package. https://CRAN.R-project.org/package=vegan.
Ooms, Jeroen. 2024. magick: Advanced Graphics and Image-Processing in r. https://CRAN.R-project.org/package=magick.
Pedersen, Thomas Lin. 2024. patchwork: The Composer of Plots. https://CRAN.R-project.org/package=patchwork.
Pinheiro, José C., and Douglas M. Bates. 2000. Mixed-Effects Models in s and s-PLUS. New York: Springer. https://doi.org/10.1007/b98882.
Pinheiro, José, Douglas Bates, and R Core Team. 2023. nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rudis, Bob, Ben Bolker, and Jan Schulz. 2017. ggalt: Extra Coordinate Systems, Geoms,” Statistical Transformations, Scales and Fonts for ggplot2. https://CRAN.R-project.org/package=ggalt.
Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with r. New York: Springer. http://lmdvr.r-forge.r-project.org.
Simpson, Gavin L. 2022. permute: Functions for Generating Restricted Permutations of Data. https://CRAN.R-project.org/package=permute.
———. 2024. gratia: Graceful ggplot-Based Graphics and Other Functions for GAMs Fitted Using mgcv. https://gavinsimpson.github.io/gratia/.
Stauffer, Reto, Georg J. Mayr, Markus Dabernig, and Achim Zeileis. 2009. “Somewhere over the Rainbow: How to Make Effective Use of Colors in Meteorological Visualizations.” Bulletin of the American Meteorological Society 96 (2): 203–16. https://doi.org/10.1175/BAMS-D-13-00155.1.
Thernau, Terry, Beth Atkinson, and Brian Ripley. 2023. “Rpart: Recursive Partitioning and Regression Trees.” Computer Program. https://github.com/bethatkinson/rpart.
Therneau, Terry, and Beth Atkinson. 2023. rpart: Recursive Partitioning and Regression Trees. https://CRAN.R-project.org/package=rpart.
Vaughan, Davis, and Simon Couch. 2024. workflows: Modeling Workflows. https://CRAN.R-project.org/package=workflows.
Viera, Vasco M. N. C. C. 2012. “Permutation Tests to Estimate Significances on Principal Components Analysis.” Journal Article. Computational Ecology and Software 2 (2): 103–23.
Wei, Taiyun, and Viliam Simko. 2021. R Package corrplot: Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Thomas Lin Pedersen, and Dana Seidel. 2023. scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.
Wilke, Claus O. 2024. cowplot: Streamlined Plot Theme and Plot Annotations for ggplot2. https://CRAN.R-project.org/package=cowplot.
Wilson Black, Joshua, James Brand, Jen Hay, and Lynn Clark. 2023. “Using Principal Component Analysis to Explore Co‐variation of Vowels.” Language and Linguistics Compass 17 (1): e12479. https://doi.org/10.1111/lnc3.12479.
Wood, S. N. 2017. Generalized Additive Models: An Introduction with r. 2nd ed. Chapman; Hall/CRC.
Wood, S. N. 2003. “Thin-Plate Regression Splines.” Journal of the Royal Statistical Society (B) 65 (1): 95–114.
———. 2004. “Stable and Efficient Multiple Smoothing Parameter Estimation for Generalized Additive Models.” Journal of the American Statistical Association 99 (467): 673–86.
———. 2011. “Fast Stable Restricted Maximum Likelihood and Marginal Likelihood Estimation of Semiparametric Generalized Linear Models.” Journal of the Royal Statistical Society (B) 73 (1): 3–36.
Wood, S. N., N., Pya, and B. S"afken. 2016. “Smoothing Parameter and Model Selection for General Smooth Models (with Discussion).” Journal of the American Statistical Association 111: 1548–75.
Wright, Marvin N, and Andreas Ziegler. 2017. “Ranger: A Fast Implementation of Random Forests for High Dimensional Data in c++ and r.” Journal Article. Journal of Statistical Software 77 (1): 1–17. https://doi.org/10.18637/jss.v077.i01.
Wright, Marvin N, Andreas Ziegler, and Inke R König. 2016. “Do Little Interactions Get Lost in Dark Random Forests?” BMC Bioinformatics 17: 1–10.
Xie, Yihui. 2014. knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2024. knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Zeileis, Achim, Jason C. Fisher, Kurt Hornik, Ross Ihaka, Claire D. McWhite, Paul Murrell, Reto Stauffer, and Claus O. Wilke. 2020. colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes.” Journal of Statistical Software 96 (1): 1–49. https://doi.org/10.18637/jss.v096.i01.
Zeileis, Achim, Kurt Hornik, and Paul Murrell. 2009. “Escaping RGBland: Selecting Colors for Statistical Graphics.” Computational Statistics & Data Analysis 53 (9): 3259–70. https://doi.org/10.1016/j.csda.2008.11.033.
Zhu, Hao. 2024. kableExtra: Construct Complex Table with kable and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.

Footnotes

  1. New Zealand European.↩︎

  2. We note that vowels are counted as being “present” regardless of stress.↩︎

  3. The only discernible exception is speaker 37, who moves substantially along D2. This may be a reflection of 37 being a fast but low pitch lagger, and consequently less perceptually stable (i.e., they could be grouped with the fast laggers or the slower/lower pitch laggers).↩︎